Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Matthijs Kooijman
Hi folks,

while trying to parse git diff-tree output, I found out that in some
cases it appears to generate an incorrect diff (AFAICT). I orginally
found this in a 5-way merge commit in the Linux kernel, but managed to
reduce this to something a lot more managable (an ordinary 2-way merge
on a 6-line file).

To start with the wrong-ness, this is the diff generated:

$ git diff-tree -p -c HEAD
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --combined file
index 3404f54,0eab113..e8c8c18
--- a/file
+++ b/file
@@@ -1,7 -1,5 +1,6 @@@
 +LEFT
  BASE2
  BASE3
  BASE4
- BASE5
+ BASE5MODIFIED
  BASE6

Here, the header claims that the first head has 7 lines, but there really are
only 6 (5 lines of context and one delete line). The numbers for the others
heads are incorrect. In the original diff, the difference was bigger
(first head was stated to have 28 lines, while the output was similar to
the above).

To find out what's going on, we can look at the -m output, which is
correct (or look at the original file contents at the end of this mail).

$ git diff-tree -m -p HEAD
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --git a/file b/file
index 3404f54..e8c8c18 100644
--- a/file
+++ b/file
@@ -1,7 +1,6 @@
 LEFT
-BASE1
 BASE2
 BASE3
 BASE4
-BASE5
+BASE5MODIFIED
 BASE6
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --git a/file b/file
index 0eab113..e8c8c18 100644
--- a/file
+++ b/file
@@ -1,3 +1,4 @@
+LEFT
 BASE2
 BASE3
 BASE4

As you can see here, first head added LEFT, and the second head removed
BASE1 and modified BASE5. In the -c diff-tree output above, this removal of
BASE1 is not shown, but it is counted in the number of lines, causing this
breakage.


Note that to trigger this behaviour, the number of context lines between the
BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from
occuring. Also, the LEFT line introduced does not seem to be
essential, but there needed to be some change from both sides in order
to generate a diff at all.

I haven't looked into the code, though I might give that a go later.
Anyone got any clue why this is happening? Is this really a bug, or am I
misunderstanding here?

To recreate the above situation, you can use the following commands:

git init
cat  file EOF
BASE1
BASE2
BASE3
BASE4
BASE5
BASE6
EOF
git add file
git commit -m BASE
git checkout -b RIGHT
cat  file EOF
BASE2
BASE3
BASE4
BASE5MODIFIED
BASE6
EOF
git commit -m RIGHT file
git checkout -b LEFT master
cat  file EOF
LEFT
BASE1
BASE2
BASE3
BASE4
BASE5
BASE6
EOF
git commit -m LEFT file
git merge RIGHT
cat  file EOF
LEFT
BASE2
BASE3
BASE4
BASE5MODIFIED
BASE6
EOF
git add file
git commit --no-edit
git diff-tree -p -c HEAD


Gr.

Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Matthijs Kooijman
Hi folks,

 $ git diff-tree -p -c HEAD
 d945a51b6ca22e6e8e550c53980d026f11b05158
 diff --combined file
 index 3404f54,0eab113..e8c8c18
 --- a/file
 +++ b/file
 @@@ -1,7 -1,5 +1,6 @@@
  +LEFT
   BASE2
   BASE3
   BASE4
 - BASE5
 + BASE5MODIFIED
   BASE6

I found the spot in the code where this is going wrong, there is an
incorrectly set no_pre_delete flag for the context lines before each
hunk. Since a patch says more than a thousand words, here's what I think
will fix this problem:

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..d36bfcf 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -518,8 +518,11 @@ static int give_context(struct sline *sline, unsigned long 
cnt, int num_parent)
unsigned long k;
 
/* Paint a few lines before the first interesting line. */
-   while (j  i)
-   sline[j++].flag |= mark | no_pre_delete;
+   while (j  i) {
+   if (!(sline[j++].flag  mark))
+   sline[j++].flag |= no_pre_delete;
+   sline[j++].flag |= mark;
+   }
 
again:
/* we know up to i is to be included.  where does the

I'll see if I can write up a testcase and then submit this as a proper
patch, but I wanted to at least send this over now lest someone wastes
time coming to the same conclusion as I did.

Gr.

Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Junio C Hamano
Matthijs Kooijman matth...@stdin.nl writes:

 $ git diff-tree -p -c HEAD
 d945a51b6ca22e6e8e550c53980d026f11b05158
 diff --combined file
 index 3404f54,0eab113..e8c8c18
 --- a/file
 +++ b/file
 @@@ -1,7 -1,5 +1,6 @@@
  +LEFT
   BASE2
   BASE3
   BASE4
 - BASE5
 + BASE5MODIFIED
   BASE6

 Here, the header claims that the first head has 7 lines, but there really are
 only 6 (5 lines of context and one delete line). The numbers for the others
 heads are incorrect. In the original diff, the difference was bigger
 (first head was stated to have 28 lines, while the output was similar to
 the above).

The count and the output does look inconsistent.  The hunk header
claims that it is showing:

 - range 1,7 for the first parent but it should be 1,5 (2, 3, 4, 5 and 6) 
   to match the output.
 - range 1,5 for the second parent (left, 2, 3, 4, 5mod, and 6 -- correct)
 - range 1,6 for the result (left, 2, 3, 4, 5mod and 6 -- correct)

If we resurrect the loss of BASE1 from the output, then the
output should have shown:

  +LEFT
 - BASE1
   BASE2
   BASE3
   BASE4
 - BASE5
 + BASE5MODIFIED
   BASE6

which means the numbers shown for the first parent (1, 2, 3, 4, 5
and 6) should be 1,6.

 Note that to trigger this behaviour, the number of context lines between the
 BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from
 occuring.

I think the coalescing of two adjacent hunks into one is painting
leading lines interesting to show context but not worth showing
deletion before it incorrectly.

Does this patch fix the issue?

 combine-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..7359b84 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -533,7 +533,7 @@ static int give_context(struct sline *sline, unsigned long 
cnt, int num_parent)
k = find_next(sline, mark, j, cnt, 0);
j = adjust_hunk_tail(sline, all_mask, i, j);
 
-   if (k  j + context) {
+   if (k = j + context) {
/* k is interesting and [j,k) are not, but
 * paint them interesting because the gap is small.
 */
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Matthijs Kooijman
Hi Junio,

 I think the coalescing of two adjacent hunks into one is painting
 leading lines interesting to show context but not worth showing
 deletion before it incorrectly.
Yup, that seems to be the case.

 Does this patch fix the issue?

Yes, it fixes the issue. However, I think that this patch actually hides
the real problem (in a way that will always work with the current code,
though).

I had come up with a different fix myself (similar to the one I sent to
the list as a followup, but that one still had a bug), which I think
might be better. In any case, it includes a testcase for this bug which
seems good to include.

I'll send my patch as a followup in a minute, feel free to use it
entirely or only partially.

Gr.

Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Junio C Hamano
Matthijs Kooijman matth...@stdin.nl writes:

 Hi Junio,

 I think the coalescing of two adjacent hunks into one is painting
 leading lines interesting to show context but not worth showing
 deletion before it incorrectly.
 Yup, that seems to be the case.

 Does this patch fix the issue?

 Yes, it fixes the issue. However, I think that this patch actually hides
 the real problem (in a way that will always work with the current code,
 though).

Could you explain why you think it hides the real problem, and what
kind of future enhancement may break it?

This is *not* my usual rhetorical question Please explain yourself,
because I think you are wrong, but is I do not understand the
reasoning behind your statement, and I (and the reasoning behind my
patch) must be missing something important, so please enlighten me
by pointing out where I am wrong, so that I won't stick to my flawed
patch.

The painting with no_pre_delete is applied when we extend the common
context back to lines we _know_ otherwise not worth showing (because
there is no difference) only because we want to show them as the
context lines and we do not need to show deletions that come before
these common context.  By forcing (k == j + context) case, that is,
there are exactly context number of lines between the end of the
current hunk and the next hunk, which the old code would have showed
context lines at the beginning of the next hunk, to go back to the
again label, we are coalescing the two hunks that _should_ have
been shown together anyway, without painting the context lines
incorrectly with before this line, do not show deletion mark.

 I had come up with a different fix myself (similar to the one I sent to
 the list as a followup, but that one still had a bug), which I think
 might be better. In any case, it includes a testcase for this bug which
 seems good to include.

 I'll send my patch as a followup in a minute, feel free to use it
 entirely or only partially.

 Gr.

 Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Matthijs Kooijman
Hi Junio,

 Could you explain why you think it hides the real problem, and what
 kind of future enhancement may break it?
I think the differences is mostly in the locality of the fix. In my
proposed patch, the no_pre_delete flag is never set on an interesting
line because it is checked in the line before it. In your patch, it
never happens because the control flow guarantees the context lines
before each change must be uninteresting.

The net effect is of course identical, but I'm arguing that depending on
the control flow and some code a doze lines down is easier to break than
depending on a previous line.

Having said that: I'm not sure if the difference is significant enough
to convince me in either direction.



However, thinking about this a bit more (and getting sidetracked on a
completely separate issue/question), I wonder why the coalescing-hunks
code is there in the first place? e.g., why not leave out these lines?

if (k  j + context) {
/* k is interesting and [j,k) are not, but
 * paint them interesting because the gap is small.
 */
while (j  k)
sline[j++].flag |= mark;
i = k;
goto again;
}

If the context lines before and after each group of changes are
painted interesting, then these lines in between will also be painted
interesting. Of course, this could cause some lines to be painted as
interesting twice and it needs my fix for the no_pre_delete thing, but
it would work just as well?

However, I can imagine that this code is present to prevent painting
lines twice, which would of course be a bit of a performance loss. But
if this really was the motivation, why is the first if not something
like:

if (k = j + 2 * context) {

Since IIUC, the current code can still paint a few context lines twice
when they are exacly context lines apart, once by the paint before
and one by the paint after code (which is also what happens in my bug
example, I think). The above should fix that as well (the first part
of the test suite hasn't complained so far).

Gr.

Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lines missing from git diff-tree -p -c output?

2013-05-15 Thread Junio C Hamano
Matthijs Kooijman matth...@stdin.nl writes:

 Could you explain why you think it hides the real problem, and what
 kind of future enhancement may break it?
 I think the differences is mostly in the locality of the fix. In my
 proposed patch, the no_pre_delete flag is never set on an interesting
 line because it is checked in the line before it. In your patch, it
 never happens because the control flow guarantees the context lines
 before each change must be uninteresting.

 The net effect is of course identical, but I'm arguing that depending on
 the control flow and some code a doze lines down is easier to break than
 depending on a previous line.

Yeah, that sounds like a reasonable reasoning.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html