Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Mon, Oct 21, 2013 at 2:51 PM, Matthijs Kooijman wrote: > Hi Duy, > > I saw your patch series got accepted in git master a while back, great! > Since I hope to be using the fixed behaviour soon, what was the plan for > including it? Am I correct in thinking that git master will become 1.8.5 > in a while? Would this series perhaps be considered for backporting to > 1.8.4.x? I was waiting for Junio to answer this as I rarely run released versions and do not care much about releases. I think normally master will be cut for the next release (1.8.5?), maint branches have backported bug fixes. I consider this an improvement rather than bug fix. So my guess is it will not be back ported to 1.8.4.x. > > Gr. > > Matthijs > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAlJk3QsACgkQz0nQ5oovr7wVOwCgvQCmB4IJ6X86727/5Kslg83G > A4UAoI8fBIXGnE1PwtwqFk/Od697dgNM > =rjMT > -END PGP SIGNATURE- > -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Duy, I saw your patch series got accepted in git master a while back, great! Since I hope to be using the fixed behaviour soon, what was the plan for including it? Am I correct in thinking that git master will become 1.8.5 in a while? Would this series perhaps be considered for backporting to 1.8.4.x? Gr. Matthijs signature.asc Description: Digital signature
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Duy, > I thought a bit but my thoughts often get stuck if I don't write them > down in form of code :-) so this is what I got so far. 4/6 is a good > thing in my opinion, but I might overlook something 6/6 is about this > thread. The series looks good to me, though I don't know enough about the code to do detailed analysis. In any case, I agree that 4/6 is a good change, it removes a bunch of similar code for the shallow special case (which is now no longer a completely separate special case). The total series also seems to actually fix the problem I reported. I'll resend the testcase from my original patch as well, which now passes with your series applied. Thanks for diving into this! Gr. Matthijs -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Mon, Aug 12, 2013 at 3:02 PM, Matthijs Kooijman wrote: > Hi Duy, > >> OK. Mathijs, do you want make a patch for it? > I'm willing, but: > - I don't understand the code and all of your comments well enough yet >to start coding right away (though I haven't actually invested enough >time in this yet, either). > - I'll be on vacation for the next two weeks. > > When I get back, I'll re-read this thread properly and reply where I > don't follow it. Feel free to continue discussing the plan until then, > of course :-) I thought a bit but my thoughts often get stuck if I don't write them down in form of code :-) so this is what I got so far. 4/6 is a good thing in my opinion, but I might overlook something 6/6 is about this thread. I'm likely offline this weekend, so all is good :-D -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Duy, > OK. Mathijs, do you want make a patch for it? I'm willing, but: - I don't understand the code and all of your comments well enough yet to start coding right away (though I haven't actually invested enough time in this yet, either). - I'll be on vacation for the next two weeks. When I get back, I'll re-read this thread properly and reply where I don't follow it. Feel free to continue discussing the plan until then, of course :-) Gr. Matthijs -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Fri, Aug 9, 2013 at 12:10 AM, Junio C Hamano wrote: > Duy Nguyen writes: > >> I fail to see the point here. There are two different things: what we >> want to send, and what we can make deltas against. Shallow boundary >> affects the former. What the recipient has affects latter. What is the >> twist about? > > do_rev_list() --> mark_edges_uninteresting() --> show_edge() callchain > that eventually does this: > > static void show_edge(struct commit *commit) > { > fprintf(pack_pipe, "-%s\n", sha1_to_hex(commit->object.sha1)); > } > > was what I had in mind. Now I see. Thanks. mark_edges_uninteresting() actually calls mark_edge_parents_uninteresting(), which calls show_edge(). The middle function is important because after calculating new depth, upload-pack calls register_shallow() for all both old and new shallow roots and those commits will have their 'parents' pointer set to NULL, which renders mark_edge_parents_uninteresting() no-op. So show_edge() is never called on shallow points' parents. >> As for considering objects before shallow boundary uninteresting, I >> have a plan for it: kill upload-pack.c:do_rev_list(). The function is >> created to make a cut at shallow boundary,... > > Hmph, that function is not primarily about shallow boundary but does > all packing in general. > > The edge hinting in there is for thin transfer where the sender > sends deltas against base objects that are known to be present in > the receiving repository, without sending the base objects. OK but edge hinting is the same in pack-objects.c:get_object_list() so the plan might still work, right? I still need to study about extra_edge_obj in upload-pack.c though. That's something knowledge that pack-objects won't have. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Duy Nguyen writes: > I fail to see the point here. There are two different things: what we > want to send, and what we can make deltas against. Shallow boundary > affects the former. What the recipient has affects latter. What is the > twist about? do_rev_list() --> mark_edges_uninteresting() --> show_edge() callchain that eventually does this: static void show_edge(struct commit *commit) { fprintf(pack_pipe, "-%s\n", sha1_to_hex(commit->object.sha1)); } was what I had in mind. For a non-shallow transfer, feeding "-" is done for commits that we do not send (we do not do so for all of them) and those that we know the recipient does have. Two different things used to be the same, but with your suggestion they are not. Which is a good thing but we need to be careful to make sure existing codepaths do not conflate them and untangle ones that do if there are any, that's all. > As for considering objects before shallow boundary uninteresting, I > have a plan for it: kill upload-pack.c:do_rev_list(). The function is > created to make a cut at shallow boundary,... Hmph, that function is not primarily about shallow boundary but does all packing in general. The edge hinting in there is for thin transfer where the sender sends deltas against base objects that are known to be present in the receiving repository, without sending the base objects. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Thu, Aug 8, 2013 at 1:51 PM, Junio C Hamano wrote: > Duy Nguyen writes: > >> I think this applies to general case as well, not just shallow. >> Imagine I have a disconnected commit that points to the latest tree >> (i.e. it contains most of latest changes). Because it's disconnected, >> it'll be ignored by the server side. But if the servide side does >> mark_tree_interesting on this commit, a bunch of blobs might be >> excluded from sending. > > I think you meant mark_tree_UNinteresting. Yes, thanks for correcting. >> ... So perhaps we could go over have_obj list >> again, if it's not processed and is >> >> - a tree-ish, mark_tree_uninteresting >> - a blob, just mark unintesting >> >> and this does regardless of shallow state or edges. > > As a general idea, I agree it may be worth trying out to see if your > concern that the "have" list may be so big that this approach may be > more costly than it is worth. > > If the recipient is known to have something, we do not have to send > it. OK. Mathijs, do you want make a patch for it? > The things that we decide not to send are not necessarily what the > recipient has, which introduces a twist you need to watch out for if > we want to go that route. > > If the recipient is known to have something, a thin transfer can > send a delta against it. You do not want to send the commits before > the shallow boundary (i.e. the parents of the commits listed in > .git/shallow) because the recipient does not want them, and that > means you may have to use a different mark to record that fact. The > recipient does not have them, we do not want to send them, and they > cannot be used as a delta base for what we do send. Which is quite > different from the ordinary "uninteresting" objects, those we decide > not to send because the recipient has them. I fail to see the point here. There are two different things: what we want to send, and what we can make deltas against. Shallow boundary affects the former. What the recipient has affects latter. What is the twist about? As for considering objects before shallow boundary uninteresting, I have a plan for it: kill upload-pack.c:do_rev_list(). The function is created to make a cut at shallow boundary, but we already have a tool for that: grafting. In my ongoing shallow series I will create a temporary shallow file that contains new roots and pass the file to pack-objects with --shallow-file. pack-objects will never see anything outside what the recipient may want (i.e. commits before shallow boundary) to receive and pack-objects' rev-list should do what upload-pack.c:do_rev_list() currently does. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Duy Nguyen writes: > I think this applies to general case as well, not just shallow. > Imagine I have a disconnected commit that points to the latest tree > (i.e. it contains most of latest changes). Because it's disconnected, > it'll be ignored by the server side. But if the servide side does > mark_tree_interesting on this commit, a bunch of blobs might be > excluded from sending. I think you meant mark_tree_UNinteresting. > ... So perhaps we could go over have_obj list > again, if it's not processed and is > > - a tree-ish, mark_tree_uninteresting > - a blob, just mark unintesting > > and this does regardless of shallow state or edges. As a general idea, I agree it may be worth trying out to see if your concern that the "have" list may be so big that this approach may be more costly than it is worth. If the recipient is known to have something, we do not have to send it. The things that we decide not to send are not necessarily what the recipient has, which introduces a twist you need to watch out for if we want to go that route. If the recipient is known to have something, a thin transfer can send a delta against it. You do not want to send the commits before the shallow boundary (i.e. the parents of the commits listed in .git/shallow) because the recipient does not want them, and that means you may have to use a different mark to record that fact. The recipient does not have them, we do not want to send them, and they cannot be used as a delta base for what we do send. Which is quite different from the ordinary "uninteresting" objects, those we decide not to send because the recipient has them. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Duy Nguyen writes: > Haven't found time to read the rest yet, but this I can answer. > .git/shallow records graft points. If a commit is in .git/shallow and > it exists in the repository, the commit is considered to have no > parents regardless of what's recorded in repository. So .git/shallow > refers to the new roots, not the missing bits. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Fri, Jul 12, 2013 at 5:01 AM, Matthijs Kooijman wrote: > Hi folks, > > while playing with shallow fetches, I've found that in some > circumstances running git fetch with --depth can return too many objects > (in particular, _all_ the objects for the requested revisions are > returned, even when some of those objects are already known to the > client). > > This happens when a client issues a fetch with a depth bigger or equal > to the number of commits the server is ahead of the client. In this > case, the revisions to be sent over will be completely detached from any > revisions the client already has (history-wise), causing the server to > effectively ignore all objects the client has (as advertised using its > have lines) and just send over _all_ objects (needed for the revisions > it is sending over). > > I've traced this down to the way do_rev_list in upload-pack.c works. If > I've poured over the code enough to understand it, this is what happens: > - The new shallow roots are made into graft points without parents. > - The "want" commits are added to the pending list (revs->pending) > - The "have" commits are marked uninteresting and added to the pending list > - prepare_revision_walk is called, which adds everything from the >pending list into the commmit list (revs->commits) > - limit_list is called, which traverses the history of each interesting >commit in the commit list (i.e., all want revisions), up to excluding >the first uninteresting commit (i.e. a have revision). The result of >this is the new commit list. > >This means the commit list now contains all commits that the client >wants, up to (excluding) any commits he already has or up to >(including) any (new) shallow roots. > - mark_edges_uninteresting is called, which marks the tree of every >parent of each edge in the commit list as uninteresting (in practice, >this marks the tree of each uninteresting parent, since those are by >definition the only kinds of revisions that can be beyond the edge). > - All trees and blobs that are referenced by trees in the commit list >but are not marked as uninteresting, are passed to git-pack-objects >to put into the pack. > > Normally, the list of commits to send over is connected to the > client's existing commits (which are marked as uninteresting). This > means that only the trees of those uninteresting ("have") commits that > are actually (direct) predecessors of the commits to send over are > marked as uninteresting. This is probably useful, since it prevents > having to go over all trees the client has (for other branches, for > example) and instead limits to the trees that are the most likely to > contain duplicate (or similar, for delta-ing) objects. > > However, in the "detached shallow fetch" case, this assumption is no > longer valid. There will be no uninteresting commits as parents for > the commit list, since all edge commits will be shallow roots (hence > have no parents). Ideally, one would find out which of the "detached" > "have" revisions are the closest to the new shallow roots, but with the > current code these shallow roots have their parents cut off long before > this code even runs, so this is probably not feasible. I think this applies to general case as well, not just shallow. Imagine I have a disconnected commit that points to the latest tree (i.e. it contains most of latest changes). Because it's disconnected, it'll be ignored by the server side. But if the servide side does mark_tree_interesting on this commit, a bunch of blobs might be excluded from sending. I used to (ab)use git and store a bunch of tags point to trees. These trees share a lot. Still, fetching a new tag means pulling all objects of the new tree even though it only needs a few new blobs and trees. So perhaps we could go over have_obj list again, if it's not processed and is - a tree-ish, mark_tree_uninteresting - a blob, just mark unintesting and this does regardless of shallow state or edges. The only downside is mark_tree_uninteresting is recursive so in unpacks lots of trees if have_obj is long, or the worktree is really big. Commit bitmap should help reduce the cost if have_obj is a committish, at least. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
On Thu, Aug 8, 2013 at 8:01 AM, Junio C Hamano wrote: > Matthijs Kooijman writes: > >>> > In your discussion (including the comment), you talk about "shallow >>> > root" (I think that is the same as what we call "shallow boundary"), >>> I think so, yes. I mean to refer to the commits referenced in >>> .git/shallow, that have their parents "hidden". >> Could you confirm that I got the terms right here (or is the shallow >> boundary the first hidden commit?) > > As long as you are consistent it is fine. I _think_ boundary refers > to what is recorded in the .git/shallow file, so they are commits > that are missing from our repository, and their immediate children > are available. Haven't found time to read the rest yet, but this I can answer. .git/shallow records graft points. If a commit is in .git/shallow and it exists in the repository, the commit is considered to have no parents regardless of what's recorded in repository. So .git/shallow refers to the new roots, not the missing bits. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Matthijs Kooijman writes: >> > In your discussion (including the comment), you talk about "shallow >> > root" (I think that is the same as what we call "shallow boundary"), >> I think so, yes. I mean to refer to the commits referenced in >> .git/shallow, that have their parents "hidden". > Could you confirm that I got the terms right here (or is the shallow > boundary the first hidden commit?) As long as you are consistent it is fine. I _think_ boundary refers to what is recorded in the .git/shallow file, so they are commits that are missing from our repository, and their immediate children are available. > My proposal was to only apply the fix for all have revisions when the > previous history traversal came across some shallow boundary commits. If > this happens, then that shallow boundary commit will be a "new" one and > it will have prevented the history traversal from finding the full list > of relevant "have" commits. In this case, we should just use all "have" > commits instead. > > Now, looking at the code, I see a few options for detecting this case: > > 1 Modify mark_edges_uninteresting to return a boolean (or have an >output argument) if any of the commits in the list of commits to find >(not the edges) is a shallow boundary. > 2 Modify mark_edges_uninteresting to have a "show_shallow" argument >that gets called for every shallow boundary. The show_shallow >function passed would then simply keep a boolean if it is passed at >least once. > 3 Add another loop over the commits _after_ the call to >mark_edges_uninteresting, that simply looks for any shallow boundary >commit. > > The last option seems sensible to me, since it prevents modifying the > somewhat generic mark_edges_uninteresting function for this specific > usecase. On the other hand, it does mean that the list of commits is > looped twice, not sure what that means for performance. > > Before I go and implement one of these, which option seems best to you? My gut feeling without looking at any patch is that the simplest (i.e. 3.) would be the best among these three. But I suspect, with any of these approaches, you would need to be very careful futzing with the edge ones. It may have an interesting interactions with --thin transfer. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Junio, I haven't got a reply to my mail yet. Could you have a look, so I can update and resubmit my patch? On Fri, Jul 12, 2013 at 09:11:57AM +0200, Matthijs Kooijman wrote: > > [administrivia: you seem to have mail-followup-to that points at you > > and the list; is that really needed???] > > In your discussion (including the comment), you talk about "shallow > > root" (I think that is the same as what we call "shallow boundary"), > I think so, yes. I mean to refer to the commits referenced in > .git/shallow, that have their parents "hidden". Could you confirm that I got the terms right here (or is the shallow boundary the first hidden commit?) > > but in this added block, there is nothing that checks CLIENT_SHALLOW > > or SHALLOW flags to special case that. > > > > Is it a good idea to unconditionally do this for all "have" > > revisions? > That's what I meant in my mail with "applying the fix unconditionally" - > there is probably some check needed (I discussed a few options in the > mail as well). > > Note that this entire do_rev_list function is only called when there are > shallow revisions involved, so there is also a basic "only when shallow" > check in place. My proposal was to only apply the fix for all have revisions when the previous history traversal came across some shallow boundary commits. If this happens, then that shallow boundary commit will be a "new" one and it will have prevented the history traversal from finding the full list of relevant "have" commits. In this case, we should just use all "have" commits instead. Now, looking at the code, I see a few options for detecting this case: 1 Modify mark_edges_uninteresting to return a boolean (or have an output argument) if any of the commits in the list of commits to find (not the edges) is a shallow boundary. 2 Modify mark_edges_uninteresting to have a "show_shallow" argument that gets called for every shallow boundary. The show_shallow function passed would then simply keep a boolean if it is passed at least once. 3 Add another loop over the commits _after_ the call to mark_edges_uninteresting, that simply looks for any shallow boundary commit. The last option seems sensible to me, since it prevents modifying the somewhat generic mark_edges_uninteresting function for this specific usecase. On the other hand, it does mean that the list of commits is looped twice, not sure what that means for performance. Before I go and implement one of these, which option seems best to you? Gr. Matthijs -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Junio, > [administrivia: you seem to have mail-followup-to that points at you > and the list; is that really needed???] I'm not subscribed to the list, so yes :-) > > This happens when a client issues a fetch with a depth bigger or equal > > to the number of commits the server is ahead of the client. > > Do you mean "smaller" (not "bigger")? Yes, I meant smaller (reworded this first sentence a few times and then messed up :-) > > diff --git a/upload-pack.c b/upload-pack.c > > index 59f43d1..5885f33 100644 > > --- a/upload-pack.c > > +++ b/upload-pack.c > > @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void > > *user_data) > > if (prepare_revision_walk(&revs)) > > die("revision walk setup failed"); > > mark_edges_uninteresting(revs.commits, &revs, show_edge); > > + /* In case we create a new shallow root, make sure that all > > +* we don't send over objects that the client already has just > > +* because their "have" revisions are no longer reachable from > > +* the shallow root. */ > > + for (i = 0; i < have_obj.nr; i++) { > > + struct commit *commit = (struct commit > > *)have_obj.objects[i].item; > > + mark_tree_uninteresting(commit->tree); > > + } > > Hmph. > > In your discussion (including the comment), you talk about "shallow > root" (I think that is the same as what we call "shallow boundary"), I think so, yes. I mean to refer to the commits referenced in .git/shallow, that have their parents "hidden". > but in this added block, there is nothing that checks CLIENT_SHALLOW > or SHALLOW flags to special case that. > > Is it a good idea to unconditionally do this for all "have" > revisions? That's what I meant in my mail with "applying the fix unconditionally" - there is probably some check needed (I discussed a few options in the mail as well). Note that this entire do_rev_list function is only called when there are shallow revisions involved, so there is also a basic "only when shallow" check in place. > Also there is another loop that iterates over "have" revisions just > above the precontext. I wonder if this added code belongs in that > loop. I think we could add it there, yes. On the other hand, if we only want to execute this code when there are shallow boundaries in the list of revisions to send (as I suggested in my previous mail), then we can't move this code up. Gr. Matthijs -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Matthijs Kooijman writes: [administrivia: you seem to have mail-followup-to that points at you and the list; is that really needed???] > This happens when a client issues a fetch with a depth bigger or equal > to the number of commits the server is ahead of the client. Do you mean "smaller" (not "bigger")? > diff --git a/upload-pack.c b/upload-pack.c > index 59f43d1..5885f33 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void *user_data) > if (prepare_revision_walk(&revs)) > die("revision walk setup failed"); > mark_edges_uninteresting(revs.commits, &revs, show_edge); > + /* In case we create a new shallow root, make sure that all > + * we don't send over objects that the client already has just > + * because their "have" revisions are no longer reachable from > + * the shallow root. */ > + for (i = 0; i < have_obj.nr; i++) { > + struct commit *commit = (struct commit > *)have_obj.objects[i].item; > + mark_tree_uninteresting(commit->tree); > + } Hmph. In your discussion (including the comment), you talk about "shallow root" (I think that is the same as what we call "shallow boundary"), but in this added block, there is nothing that checks CLIENT_SHALLOW or SHALLOW flags to special case that. Is it a good idea to unconditionally do this for all "have" revisions? Also there is another loop that iterates over "have" revisions just above the precontext. I wonder if this added code belongs in that loop. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi folks, while playing with shallow fetches, I've found that in some circumstances running git fetch with --depth can return too many objects (in particular, _all_ the objects for the requested revisions are returned, even when some of those objects are already known to the client). This happens when a client issues a fetch with a depth bigger or equal to the number of commits the server is ahead of the client. In this case, the revisions to be sent over will be completely detached from any revisions the client already has (history-wise), causing the server to effectively ignore all objects the client has (as advertised using its have lines) and just send over _all_ objects (needed for the revisions it is sending over). I've traced this down to the way do_rev_list in upload-pack.c works. If I've poured over the code enough to understand it, this is what happens: - The new shallow roots are made into graft points without parents. - The "want" commits are added to the pending list (revs->pending) - The "have" commits are marked uninteresting and added to the pending list - prepare_revision_walk is called, which adds everything from the pending list into the commmit list (revs->commits) - limit_list is called, which traverses the history of each interesting commit in the commit list (i.e., all want revisions), up to excluding the first uninteresting commit (i.e. a have revision). The result of this is the new commit list. This means the commit list now contains all commits that the client wants, up to (excluding) any commits he already has or up to (including) any (new) shallow roots. - mark_edges_uninteresting is called, which marks the tree of every parent of each edge in the commit list as uninteresting (in practice, this marks the tree of each uninteresting parent, since those are by definition the only kinds of revisions that can be beyond the edge). - All trees and blobs that are referenced by trees in the commit list but are not marked as uninteresting, are passed to git-pack-objects to put into the pack. Normally, the list of commits to send over is connected to the client's existing commits (which are marked as uninteresting). This means that only the trees of those uninteresting ("have") commits that are actually (direct) predecessors of the commits to send over are marked as uninteresting. This is probably useful, since it prevents having to go over all trees the client has (for other branches, for example) and instead limits to the trees that are the most likely to contain duplicate (or similar, for delta-ing) objects. However, in the "detached shallow fetch" case, this assumption is no longer valid. There will be no uninteresting commits as parents for the commit list, since all edge commits will be shallow roots (hence have no parents). Ideally, one would find out which of the "detached" "have" revisions are the closest to the new shallow roots, but with the current code these shallow roots have their parents cut off long before this code even runs, so this is probably not feasible. Instead, what we can do in this case, is simply mark the trees of all "have" commits as uninteresting. This prevents all objects that are contained in the "have" commits themselves from being sent to the client, which can be a big win for bigger repositories. Marking them all is is probably more work than strictly needed, but is easy to implement. I have created a mockup patch which does this, and also adds a test case demonstrating the problem. Right now, the above fix is applied always, even in cases where it isn't needed. Looking at the code, I think it would be good to let mark_edges_uninteresting look for shallow roots in the commit list (or perhaps just add another loop over the commit list inside do_rev_list) and only apply the fix if any shallow roots are in the commit list (meaning at least a part of the history to send over is detached from the clients current history). I haven't implemented this yet, wanting to get some feedback first. Also, I'm not quite sure how this fits in with the concept of "thin packs". There might be some opportunities missing here as well, though git-pack-objects is called without --thin when shallow roots are involved. I think this is related to the "-" prefixed commit sha's that are sent to git-pack-objects, but I couldn't found any documentation on what the - prefix is supposed to mean. (On a somewhat related note, show_commit in upload-pack.c checks the BOUNDARY flag, but AFAICS the revs->boundary flag is never set, so BOUNDARY cannot ever be set in this case either?) How does this patch look? Gr. Matthijs --- t/t5500-fetch-pack.sh | 11 +++ upload-pack.c | 8 2 files changed, 19 insertions(+) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index fd2598e..a022d65 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -393,6 +393,17 @@ test_expect_success 'fetch in shal