Re: Weird revision walk behaviour
On 31/05/2018 08:43, Jeff King wrote: If there are zero parents (neither relevant nor irrelevant), is it still TREESAME? I would say in theory yes. Not sure - I think roots are such a special case that TREESAME effectively doesn't matter. We always test for roots first. So what I was proposing would be to rewrite the parents to the empty set. That feels a bit radical - I believe we need to retain (some) parent information for modes that show it (eg the dangling unfilled circles in gitk). And making it a root I think could cause other problems with making it look like we have a disjoint history. I believe the next simplification step may be trying to follow down to the common root. What next here? It looks like we have a proposed solution. Do you want to try to work up a set of tests based on what you wrote earlier? I was hoping Gábor would carry on, as he's made a start... I was just planning to back-seat drive. I'd also love to hear from Junio as the expert in this area, but I think he's been a bit busy with maintainer stuff recently. So maybe I should just be patient. :) Likewise - I have been quite deep into this, but it was a quite short window of investigation a long time ago, and I've not looked at it since. Would like input from someone with more active knowledge. Kevin
Re: Weird revision walk behaviour
On Wed, May 30, 2018 at 11:20:40AM +0300, Kevin Bracey wrote: > On 30/05/2018 00:04, Jeff King wrote: > > > > Do we even need to do the parent rewriting here? By definition those > > parents aren't interesting, and we're TREESAME to whatever is in > > treesame_parents. So conceptually it seems like we just need a flag "I > > found a treesame parent", but we only convert that into a TREESAME flag > > if there are no relevant parents. > > I think it's necessary to make the rules consistent. To mark the commit as > TREESAME here when it's not TREESAME to all its parents would be > inconsistent with the definition of the TREESAME flag used everywhere else: > > * Original definition: "A commit is TREESAME if it is treesame to any > parent" > * d0af66 definition: "A commit is TREESAME if it is treesame to all parents" > * Current 4d8266 definition: "A commit is TREESAME if it is treesame to all > relevant parents; if no relevant parents then if it is treesame to all > (irrelevant) parents." > > The current problem is that the node is not marked TREESAME, but that's > consistent with the definition. I think we do have to rewrite the commit so > it is TREESAME as per the definition. Not flag it as TREESAME in violation > of it. If there are zero parents (neither relevant nor irrelevant), is it still TREESAME? I would say in theory yes. So what I was proposing would be to rewrite the parents to the empty set. But anyway, I agree with you that the first-treesame-parent strategy is not any more complex than the boolean, and is probably less likely to cause unintended headaches later on. What next here? It looks like we have a proposed solution. Do you want to try to work up a set of tests based on what you wrote earlier? I'd also love to hear from Junio as the expert in this area, but I think he's been a bit busy with maintainer stuff recently. So maybe I should just be patient. :) -Peff
Re: Weird revision walk behaviour
On 30/05/2018 00:04, Jeff King wrote: Do we even need to do the parent rewriting here? By definition those parents aren't interesting, and we're TREESAME to whatever is in treesame_parents. So conceptually it seems like we just need a flag "I found a treesame parent", but we only convert that into a TREESAME flag if there are no relevant parents. I think it's necessary to make the rules consistent. To mark the commit as TREESAME here when it's not TREESAME to all its parents would be inconsistent with the definition of the TREESAME flag used everywhere else: * Original definition: "A commit is TREESAME if it is treesame to any parent" * d0af66 definition: "A commit is TREESAME if it is treesame to all parents" * Current 4d8266 definition: "A commit is TREESAME if it is treesame to all relevant parents; if no relevant parents then if it is treesame to all (irrelevant) parents." The current problem is that the node is not marked TREESAME, but that's consistent with the definition. I think we do have to rewrite the commit so it is TREESAME as per the definition. Not flag it as TREESAME in violation of it. It's possible you *could* get away with just flagging, because we never recompute the TREESAME flag in simple history mode. But it would be a cheat, and it may have other side effects. It means this node would remain a special rare case for others to trip up on later. And I don't think it simplifies the scan. Remembering "pointer-to-first-treesame-parent" (not a list) for the rewrite is no more complex than remembering "bool-there-was-a-treesame-parent". (A bool is what earlier code did - it worked for the original TREESAME definition. My patch series dropped that bool without replacement - missing this all-irrelevant case). In the simple history mode, the assumption is we're "simplifying away merges up-front" here; we won't (and can't) rewrite parents later in a way that needs to recompute TREESAME. In the initial scan when all parents are relevant and we matched one, the commit became TREESAME as per the new definition immediately because of the rewrite. This applies the equivalent rewrite when no relevant parents, consistent with the general concept, and without changing the TREESAME definition. Kevin
Re: Weird revision walk behaviour
On Tue, May 29, 2018 at 12:06:51AM +0200, SZEDER Gábor wrote: > diff --git a/revision.c b/revision.c > index 4e0e193e57..0ddd2c1e8a 100644 > --- a/revision.c > +++ b/revision.c > @@ -605,7 +605,7 @@ static inline int limiting_can_increase_treesame(const > struct rev_info *revs) > > static void try_to_simplify_commit(struct rev_info *revs, struct commit > *commit) > { > - struct commit_list **pp, *parent; > + struct commit_list **pp, *parent, *treesame_parents = NULL; > struct treesame_state *ts = NULL; > int relevant_change = 0, irrelevant_change = 0; > int relevant_parents, nth_parent; > @@ -672,6 +672,7 @@ static void try_to_simplify_commit(struct rev_info *revs, > struct commit *commit) > switch (rev_compare_tree(revs, p, commit)) { > case REV_TREE_SAME: > if (!revs->simplify_history || !relevant_commit(p)) { > + struct commit_list *tp; > /* Even if a merge with an uninteresting >* side branch brought the entire change >* we are interested in, we do not want > @@ -680,6 +681,13 @@ static void try_to_simplify_commit(struct rev_info > *revs, struct commit *commit) >*/ > if (ts) > ts->treesame[nth_parent] = 1; > + /* But we note it for potential later > + * simplification > + */ > + tp = treesame_parents; > + treesame_parents = > xmalloc(sizeof(*treesame_parents)); > + treesame_parents->item = p; > + treesame_parents->next = tp; > continue; > } We hit this "if" if !relevant_commit(p), which I think is what we want. But we'd also hit it if !revs->simplify_history. Would we want to avoid doing the simplification in that case? I guess later we do: > @@ -716,6 +724,14 @@ static void try_to_simplify_commit(struct rev_info > *revs, struct commit *commit) > die("bad tree compare for commit %s", > oid_to_hex(&commit->object.oid)); > } > > + if (relevant_parents == 0 && revs->simplify_history && > + treesame_parents) { > + commit->parents = treesame_parents; > + commit->object.flags |= TREESAME; > + return; > + } else > + free_commit_list(treesame_parents); > + ...which blocks the !simplify_history case from triggering. But then we could avoid the allocation above in that case, I think (though I agree with Kevin's later email that we may not need it at all). Do we even need to do the parent rewriting here? By definition those parents aren't interesting, and we're TREESAME to whatever is in treesame_parents. So conceptually it seems like we just need a flag "I found a treesame parent", but we only convert that into a TREESAME flag if there are no relevant parents. I wouldn't be surprised, though, if some code path really cares whether we've simplified to a single uninteresting parent here, versus simplifying to a root commit (I admit that the simplification code is one of the areas of Git I'm least familiar with). -Peff
Re: Weird revision walk behaviour
On 29/05/2018 01:06, SZEDER Gábor wrote: So, without investing nearly enough time to understand what is going on, I massaged the above diffs into this: Cool. + treesame_parents = xmalloc(sizeof(*treesame_parents)); There's no need to actually record a list here. This is just for the simple history. We are only interested into becoming a non-merge to 1 treesame parent, so I think we just need to record a pointer to the first one we see, just as this would exit immediately for the first relevant one. For the full-history case, we already have the full "which parents are treesame" recording mechanism just above, but it only kicks in for merge commits and only when settings require it. Adding a malloc here would be significant machinery overhead. FWIW, the test suite passes with the above patch applied. I doubt there's an existing case like this anywhere in the revision test suite :) . And this patch is focused enough that it *should* only be changing the behaviour of this very specific case. As such, it does feel a little like a kludge, but I think it's fine because it's aligning the simple-history analysis with the "analyse relevant parents if any, else analyse irrelevant" rule of the full-history. And here is the small PoC test case to illustrate the issue, which fails without but succeeds with the above patch. Eventually it should be part of 't6012-rev-list-simplify.sh', of course, but I haven't looked into that yet. It may be there's enough criss-crossy history to test here to merit breaking out into a second test series. +# B---M2 master +# / \ / +# A X +# \ / \ +# C---M1 b2 +# +# and modify 'file' in commits 'A' and 'B', so one of 'M1's parents +# ('B') is TREESAME wrt. 'file'. I guess we'll be wanting test cases for A..B2, B..B2 and C..B2, and some where the the base is "some other child of B or C". "B..B2" is no longer a pure set subtraction for simplification as B is UNINTERESTING (ie not in the set) but RELEVANT (because you named it as a bottom commit), so B..B2 actually still leaves M1 with 2 relevant parents. You'd want test cases covering B relevant+C irrelevant and B irrelevant+C relevant, which means subtracting them without naming them - so name a child of one. And then we need to think about whether we want it displayed in each of the other modes for each of those queries... Kevin
Re: Weird revision walk behaviour
> On 24/05/2018 23:26, Kevin Bracey wrote: > > > >>> On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote: > >>> > $ git log --oneline master..ba95710a3b -- ci/ > ea44c0a594 Merge branch 'bw/protocol-v2' into > jt/partial-clone-proto-v2 > > > In this case, we're hitting a merge commit which is not on master, but > > it has two parents which both are. Indeed, I didn't notice this important detail amidst the complexity of the git history. Thanks, with this info I could come up with a small test case to demonstrate the issue, see below. > Which, IIRC, means the merge commit > > is INTERESTING with two UNINTERESTING parents; and we are TREESAME to > > only one of them. > > > > The commit changing the logic of TREESAME you identified believes that > > those TREESAME changes for merges which were intended to improve > > fuller history modes shouldn't affect the simple history "because > > partially TREESAME merges are turned into normal commits". Clearly > > that didn't happen here. > > > Haven't currently got a development environment set up here, but I've > been looking at the code.Here's a proposal, untested, as a potential > starting point if anyone wants to consider a proper patch. > > The simplify_history first-scan logic never actually turned merges into > simple commits unless they were TREESAME to a relevant/interesting > parent. Anything where the TREESAME parent was UNINTERESTING was > retained as a merge, but had its TREESAME flag set, and that permitted > later simplification. > > With the redefinition of the TREESAME flag, this merge commit is no > longer TREESAME, and as the decoration logic to refine TREESAME isn't > active for simplify_history, it doesn't get cleaned up (even if it would > be in full history?) > > I think the answer may be to add an extra post-process step on the > initial loop to handle this special case. Something like: > > case REV_TREE_SAME: > if (!revs->simplify_history || !relevant_commit(p)) { > /* Even if a merge with an uninteresting > * side branch brought the entire change > * we are interested in, we do not want > * to lose the other branches of this > * merge, so we just keep going. > */ > if (ts) > ts->treesame[nth_parent] = 1; > + /* But we note it for potential later simplification */ > + if (!treesame_parent) > + treesame_parent = p; > continue; > } > > ... > > After loop: > > + if (relevant_parents == 0 && revs->simplify_history && > treesame_parent) { > + treesame_parent->next = NULL;// Repeats code from loop - > share somehow? > + commit->parents = treesame_parent; > + commit->object.flags |= TREESAME; > + return; > + } > > /* > * TREESAME is straightforward for single-parent commits. For merge So, without investing nearly enough time to understand what is going on, I massaged the above diffs into this: --- >8 --- diff --git a/revision.c b/revision.c index 4e0e193e57..0ddd2c1e8a 100644 --- a/revision.c +++ b/revision.c @@ -605,7 +605,7 @@ static inline int limiting_can_increase_treesame(const struct rev_info *revs) static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) { - struct commit_list **pp, *parent; + struct commit_list **pp, *parent, *treesame_parents = NULL; struct treesame_state *ts = NULL; int relevant_change = 0, irrelevant_change = 0; int relevant_parents, nth_parent; @@ -672,6 +672,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) switch (rev_compare_tree(revs, p, commit)) { case REV_TREE_SAME: if (!revs->simplify_history || !relevant_commit(p)) { + struct commit_list *tp; /* Even if a merge with an uninteresting * side branch brought the entire change * we are interested in, we do not want @@ -680,6 +681,13 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) */ if (ts) ts->treesame[nth_parent] = 1; + /* But we note it for potential later +* simplification +*/ + tp = treesame_parents; + treesame_parents = xmalloc(sizeof(*treesame_parents)); + treesame_parents->item = p; + treesame_parents->next = tp; continue;
Re: Weird revision walk behaviour
On 24/05/2018 23:26, Kevin Bracey wrote: On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote: $ git log --oneline master..ba95710a3b -- ci/ ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 In this case, we're hitting a merge commit which is not on master, but it has two parents which both are. Which, IIRC, means the merge commit is INTERESTING with two UNINTERESTING parents; and we are TREESAME to only one of them. The commit changing the logic of TREESAME you identified believes that those TREESAME changes for merges which were intended to improve fuller history modes shouldn't affect the simple history "because partially TREESAME merges are turned into normal commits". Clearly that didn't happen here. Haven't currently got a development environment set up here, but I've been looking at the code.Here's a proposal, untested, as a potential starting point if anyone wants to consider a proper patch. The simplify_history first-scan logic never actually turned merges into simple commits unless they were TREESAME to a relevant/interesting parent. Anything where the TREESAME parent was UNINTERESTING was retained as a merge, but had its TREESAME flag set, and that permitted later simplification. With the redefinition of the TREESAME flag, this merge commit is no longer TREESAME, and as the decoration logic to refine TREESAME isn't active for simplify_history, it doesn't get cleaned up (even if it would be in full history?) I think the answer may be to add an extra post-process step on the initial loop to handle this special case. Something like: case REV_TREE_SAME: if (!revs->simplify_history || !relevant_commit(p)) { /* Even if a merge with an uninteresting * side branch brought the entire change * we are interested in, we do not want * to lose the other branches of this * merge, so we just keep going. */ if (ts) ts->treesame[nth_parent] = 1; + /* But we note it for potential later simplification */ + if (!treesame_parent) + treesame_parent = p; continue; } ... After loop: + if (relevant_parents == 0 && revs->simplify_history && treesame_parent) { + treesame_parent->next = NULL;// Repeats code from loop - share somehow? + commit->parents = treesame_parent; + commit->object.flags |= TREESAME; + return; + } /* * TREESAME is straightforward for single-parent commits. For merge The other option would be to take off the " || !relevant_commit(p)" test, but I'm assuming that is still needed for other cases. Kevin
Re: Weird revision walk behaviour
On 23/05/2018 20:35, Jeff King wrote: On Wed, May 23, 2018 at 01:32:46PM -0400, Jeff King wrote: On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote: $ git log --oneline master..ba95710a3b -- ci/ ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 I keep some older builds around, and it does not reproduce with v1.6.6.3 (that's my usual goto for "old"). Bisecting turns up d0af663e42 (revision.c: Make --full-history consider more merges, 2013-05-16). It looks like an unintended change (the commit message claims that the non-full-history case shouldn't be affected). There's more discussion in the thread at: https://public-inbox.org/git/1366658602-12254-1-git-send-email-ke...@bracey.fi/ I haven't absorbed it all yet, but I'm adding Junio to the cc. In this case, we're hitting a merge commit which is not on master, but it has two parents which both are. Which, IIRC, means the merge commit is INTERESTING with two UNINTERESTING parents; and we are TREESAME to only one of them. The commit changing the logic of TREESAME you identified believes that those TREESAME changes for merges which were intended to improve fuller history modes shouldn't affect the simple history "because partially TREESAME merges are turned into normal commits". Clearly that didn't happen here. I think we need to look at why that isn't happening, and if it can be made to happen. The problem is that this commit is effectively the base of the graph - it's got a double-connection to the UNINTERESTING set, and maybe that prevented the simple history "follow 1 TREESAME" logic from kicking in. Maybe it won't follow 1 TREESAME to UNINTERESTING. I know there were quite a few changes later in the series to try to reconcile the simple and full history, for the cases where the simple history takes a weird path because of its love of TREESAME parents, hiding evil merges. But I believe the simple history behaviour was supposed to remain as-is - take first TREESAME always. Kevin
Re: Weird revision walk behaviour
On 23/05/2018 20:35, Jeff King wrote: There's more discussion in the thread at: https://public-inbox.org/git/1366658602-12254-1-git-send-email-ke...@bracey.fi/ I haven't absorbed it all yet, but I'm adding Junio to the cc. Just to ack that I've seen the discussion, but I can't identify the code's reasoning at the moment. My recollection is that I accepted while coming up with the algorithm that it might err slightly on the side of false positives in the display - there were some merge cases I was unable to fully distinguish whether or not the merge had lost a change it shouldn't have done, and if I was uncertain I'd rather show it than not. The first commit was not originally intended to alter behaviour for anything other than --full-history, but later in the chain there was specific consideration into tracking the path to the specified "bottom" commit. It may be that's part of what's happening here. Kevin
Re: Weird revision walk behaviour
On Wed, May 23, 2018 at 01:32:46PM -0400, Jeff King wrote: > On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote: > > > $ git log --oneline master..ba95710a3b -- ci/ > > ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 > > > > But as far as I can tell, there are no changes in the 'ci/' directory > > on any of the merge's parents: > > > > $ git log --oneline master..ea44c0a594^1 -- ci/ > > # Nothing. > > $ git log --oneline master..ea44c0a594^2 -- ci/ > > # Nothing! > > Hmm. That commit does touch "ci/" with respect to one of its parents. > It should get simplified away because it completely matches the other > parent, so it does sound like a bug. > > > This is not specific to the 'ci/' directory, it seems that any > > untouched directory does the trick: > > > > $ git log --oneline master..ea44c0a594 -- contrib/coccinelle/ t/lib-httpd/ > > ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 > > Both of those directories also differ between one parent. If you try it > with "contrib/remote-helpers", which does not, then the commit does not > appear. > > So it does seem like a bug where we should be simplifying away the merge > but are not (or I'm missing the corner case, too ;) ). > > > I get the same behavior with Git built from current master and from > > past releases as well (tried it as far back as v2.0.0). > > I keep some older builds around, and it does not reproduce with v1.6.6.3 > (that's my usual goto for "old"). Bisecting turns up d0af663e42 > (revision.c: Make --full-history consider more merges, 2013-05-16). It > looks like an unintended change (the commit message claims that the > non-full-history case shouldn't be affected). There's more discussion in the thread at: https://public-inbox.org/git/1366658602-12254-1-git-send-email-ke...@bracey.fi/ I haven't absorbed it all yet, but I'm adding Junio to the cc. -Peff
Re: Weird revision walk behaviour
On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote: > $ git log --oneline master..ba95710a3b -- ci/ > ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 > > But as far as I can tell, there are no changes in the 'ci/' directory > on any of the merge's parents: > > $ git log --oneline master..ea44c0a594^1 -- ci/ > # Nothing. > $ git log --oneline master..ea44c0a594^2 -- ci/ > # Nothing! Hmm. That commit does touch "ci/" with respect to one of its parents. It should get simplified away because it completely matches the other parent, so it does sound like a bug. > This is not specific to the 'ci/' directory, it seems that any > untouched directory does the trick: > > $ git log --oneline master..ea44c0a594 -- contrib/coccinelle/ t/lib-httpd/ > ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2 Both of those directories also differ between one parent. If you try it with "contrib/remote-helpers", which does not, then the commit does not appear. So it does seem like a bug where we should be simplifying away the merge but are not (or I'm missing the corner case, too ;) ). > I get the same behavior with Git built from current master and from > past releases as well (tried it as far back as v2.0.0). I keep some older builds around, and it does not reproduce with v1.6.6.3 (that's my usual goto for "old"). Bisecting turns up d0af663e42 (revision.c: Make --full-history consider more merges, 2013-05-16). It looks like an unintended change (the commit message claims that the non-full-history case shouldn't be affected). -Peff