Re: [git-users] How hard would it be to implement sparse fetching/pulling?

2017-11-30 Thread Konstantin Khomoutov
On Wed, Nov 29, 2017 at 06:42:54PM -0800, vit via Git for human beings wrote:

> I'm looking for ways to improve fetch/pull/clone time for large git 
> (mono)repositories with unrelated source trees (that span across multiple 
> services).
> I've found sparse checkout approach appealing and helpful for most of 
> client-side operations (e.g. status, reset, commit, add etc)
> The problem is that there is no feature like sparse fetch/pull in git, this 
> means that ALL objects in unrelated trees are always fetched.
> It takes a lot of time for large repositories and results in some practical 
> scalability limits for git.
> This forced some large companies like Facebook and Google to move to 
> Mercurial as they were unable to improve client-side experience with git 
> and Microsoft has developed GVFS which seems to be a step back to CVCS 
> world.
[...]

(To anyone interested, there's a cross-post to the main Git list which
Vitaly failed to mention: [1]. I think it could spark some interesting
discussion.)

As to the essence of the question, I think you blame GVFS for no real
reason. While Microsoft is being Microsoft — their implementation of
GVFS is written in .NET and *requires* Windows 10 (this one is beyond
me), it's based on an open protocol [2] which basically assumes the
presence of a RESTful HTTP endpoint at the "Git server side" and
apparently designed to work well with the repository format the current
stock Git uses which makes it implementable on both sides by anyone
interested.

The second hint I have is that the idea of fetching data lazily
is being circulated among the Git developers for some time already, and
something is really being done in this venue so you could check and see
what's there [3, 4] and maybe trial it and help out those who works on this
stuff.

1. 
https://public-inbox.org/git/CANxXvsMbpBOSRKaAi8iVUikfxtQp=kofz60n0phxs+r+q1k...@mail.gmail.com/
2. https://github.com/Microsoft/GVFS/blob/master/Protocol.md
3. https://public-inbox.org/git/?q=lazy+fetch
4. https://public-inbox.org/git/?q=partial+clone

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] How hard would it be to implement sparse fetching/pulling?

2017-11-29 Thread vit via Git for human beings
Hi guys,

I'm looking for ways to improve fetch/pull/clone time for large git 
(mono)repositories with unrelated source trees (that span across multiple 
services).
I've found sparse checkout approach appealing and helpful for most of 
client-side operations (e.g. status, reset, commit, add etc)
The problem is that there is no feature like sparse fetch/pull in git, this 
means that ALL objects in unrelated trees are always fetched.
It takes a lot of time for large repositories and results in some practical 
scalability limits for git.
This forced some large companies like Facebook and Google to move to 
Mercurial as they were unable to improve client-side experience with git 
and Microsoft has developed GVFS which seems to be a step back to CVCS 
world.

I want to get a feedback (from more experienced git users than I am) on 
what it would take to implement sparse fetching/pulling. (Downloading only 
objects related to the sparse-checkout list)
Are there any issues with missing hashes? 
Are there any fundamental problems why it can't be done?
Can we get away with only client-side changes or would it require special 
features on the server side?

If we had such a feature then all we would need on top is a separate tool 
that builds the right "sparse" scope for the workspace based on paths that 
developer wants to work on.

In the world where more and more companies are moving towards large 
monorepos this improvement should provide a good way of scaling git to meet 
this demand.

PS. Please don't advice to split things up, as there are some good reasons 
why many companies decide to keep their code in the monorepo, which you can 
easily find online. So let's keep that part out the scope.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.