Re: [PATCH] fix segfault with git log -c --follow

2013-05-28 Thread Junio C Hamano
Clemens Buchacher  writes:

>> I wonder, just like we force recursive and disable external on the
>> copy before we use it to call diff_tree_sha1(), if we should disable
>> follow-renames on it.  "--follow" is an option that is given to the
>> history traversal part and it should not play any role in getting the
>> pairwise diff with all parents diff_tree_combined() does.
>
> Can't parse that last sentence.
>
> In any case, I don't think disabling diff_tree_sha1 is a solution. The
> bug is in diff_tree_sha1 and its subfunctions, because they manipulate a
> data structures such that it becomes corrupt. And they do so in an
> obfuscated and clearly unintentional manner. So we should not blame the
> user for calling diff_tree_sha1 in such a way that it causes corruption.
>
>> Besides,
>> 
>>  - "--follow" hack lets us keep track of only one path; and
>
> Ok. Good to know it is considered a hack. The code is quite strange
> indeed.

The problem with --follow is that it only tracks one path globally.
In a history like this, suppose that a path X long time ago was
renamed to Y at commit B:

---o---A---B---C---o HEAD

and you start digging with "log --follow -c HEAD -- Y".  When
looking at C, because it and its parent B both have path Y, the
try-to-follow hack does not kick in, and when trying to show C, we
will show the change in Y (because that is the pathspec).

Then we look at B.  Because B has path we are following, i.e. Y, and
its parent A does not, try-to-follow hack kicks in, and it mangles
the pathspec that is used globally for history traversal to X while
showing the difference between A's X and B's Y.  Then we dig further
to find A; at this point the global pathspec is swapped and now it
is X.

That makes --follow a working hack for a simplest single strand of
pearls.  But if you have a mergy history, e.g.

---o---A---B---C---o HEAD
\ /
 D---E---F---G---H

it can break in interesting ways.  We are likely to have looked at H
before looking at B and used pathspec Y while inspecting H, but
after looking at B, the global pathspec is swapped to X, and then we
try to look at G, F, E and D, none of which may have renamed the
original X, so you would likely miss the change to the path Y you
wanted to follow.

To fix this, we would need to keep "what path are we following" not
in the global revs->pathspec, but per the traversal paths that are
currently active (e.g. when we look at C and H, it is Y, when we
look at B, it is X, when we look at G, that is inherited from H and
still Y, not affected by the rename at B.  And then when we look at
A (we need topo-order traversal to do this), it needs to notice that
one child (i.e. B) has been following X while the other (i.e. D) Y,
and merge the "I've been following this path" information in a
sensible way (e.g. look at its own tree and see what is available,
in this case X).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix segfault with git log -c --follow

2013-05-28 Thread Clemens Buchacher
On Tue, May 28, 2013 at 10:22:17AM -0700, Junio C Hamano wrote:
> Clemens Buchacher  writes:
> 
> > In diff_tree_combined we make a copy of diffopts. In
> > try_to_follow_renames, called via diff_tree_sha1, we free and
> > re-initialize diffopts->pathspec->items. Since we did not make a deep
> > copy of diffopts in diff_tree_combined, the original diffopts does not
> > get the update. By the time we return from diff_tree_combined,
> > rev->diffopt->pathspec->items points to an invalid memory address. We
> > get a segfault next time we try to access that pathspec.
> 
> I am not quite sure if I follow.  Do you mean
> 
>   diff_tree_combined()
> - makes a shallow copy of rev->diffopt
> - calls diff_tree_sha1()
>   diff_tree_sha1()
>   - tries to follow rename and clobbers diffopt

Right.

> - tries to use the shallow copy of original rev->diffopt
>   that no longer is valid, which is a problem

diff_tree_combined does not try to use it right away. It does return,
but rev->diffopt is now invalid and the next time we do any kind of diff
with it, we have a problem.

> I wonder, just like we force recursive and disable external on the
> copy before we use it to call diff_tree_sha1(), if we should disable
> follow-renames on it.  "--follow" is an option that is given to the
> history traversal part and it should not play any role in getting the
> pairwise diff with all parents diff_tree_combined() does.

Can't parse that last sentence.

In any case, I don't think disabling diff_tree_sha1 is a solution. The
bug is in diff_tree_sha1 and its subfunctions, because they manipulate a
data structures such that it becomes corrupt. And they do so in an
obfuscated and clearly unintentional manner. So we should not blame the
user for calling diff_tree_sha1 in such a way that it causes corruption.

> Besides,
> 
>  - "--follow" hack lets us keep track of only one path; and

Ok. Good to know it is considered a hack. The code is quite strange
indeed.

>  - "-c" and "--cc" make sense only when dealing with a merge commit
>and the path in the child may have come from different path in
>parents,

Sorry, I don't get it.

> so I am not sure if allowing combination of "--follow -c/--cc" makes
> much sense in the first place.

My use-case is came up with this history:

1. Code gets added to file A.
2. File A gets renamed to B in a different branch.
3. The branches get merged, and code from (1) is removed in the merge.

Later I wonder why code from (1) is gone from B even though I felt
certain it had been added before. I also remember that B was renamed at
some point. So I do git log -p --follow B, and it nicely shows that diff
where the code was added, but no diff where the code is removed.

The reason is of course, that the code was removed in the merge and that
diff is not shown. And -c is usually what I do to enable showing diffs
in merge commits.

And if the pairwise diff can also deal with file renames, I think it
absolutely does make sense to show also a three-way diff.

I can't tell far away the code is from supporting anything like that.

Cheers,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix segfault with git log -c --follow

2013-05-28 Thread Junio C Hamano
Clemens Buchacher  writes:

> In diff_tree_combined we make a copy of diffopts. In
> try_to_follow_renames, called via diff_tree_sha1, we free and
> re-initialize diffopts->pathspec->items. Since we did not make a deep
> copy of diffopts in diff_tree_combined, the original diffopts does not
> get the update. By the time we return from diff_tree_combined,
> rev->diffopt->pathspec->items points to an invalid memory address. We
> get a segfault next time we try to access that pathspec.

I am not quite sure if I follow.  Do you mean

diff_tree_combined()
- makes a shallow copy of rev->diffopt
- calls diff_tree_sha1()
  diff_tree_sha1()
  - tries to follow rename and clobbers diffopt
- tries to use the shallow copy of original rev->diffopt
  that no longer is valid, which is a problem

I wonder, just like we force recursive and disable external on the
copy before we use it to call diff_tree_sha1(), if we should disable
follow-renames on it.  "--follow" is an option that is given to the
history traversal part and it should not play any role in getting
the pairwise diff with all parents diff_tree_combined() does.

Besides,

 - "--follow" hack lets us keep track of only one path; and

 - "-c" and "--cc" make sense only when dealing with a merge commit
   and the path in the child may have come from different path in
   parents,

so I am not sure if allowing combination of "--follow -c/--cc" makes
much sense in the first place.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html