Re: [PATCH] fix segfault with git log -c --follow
Clemens Buchacher writes: >> I wonder, just like we force recursive and disable external on the >> copy before we use it to call diff_tree_sha1(), if we should disable >> follow-renames on it. "--follow" is an option that is given to the >> history traversal part and it should not play any role in getting the >> pairwise diff with all parents diff_tree_combined() does. > > Can't parse that last sentence. > > In any case, I don't think disabling diff_tree_sha1 is a solution. The > bug is in diff_tree_sha1 and its subfunctions, because they manipulate a > data structures such that it becomes corrupt. And they do so in an > obfuscated and clearly unintentional manner. So we should not blame the > user for calling diff_tree_sha1 in such a way that it causes corruption. > >> Besides, >> >> - "--follow" hack lets us keep track of only one path; and > > Ok. Good to know it is considered a hack. The code is quite strange > indeed. The problem with --follow is that it only tracks one path globally. In a history like this, suppose that a path X long time ago was renamed to Y at commit B: ---o---A---B---C---o HEAD and you start digging with "log --follow -c HEAD -- Y". When looking at C, because it and its parent B both have path Y, the try-to-follow hack does not kick in, and when trying to show C, we will show the change in Y (because that is the pathspec). Then we look at B. Because B has path we are following, i.e. Y, and its parent A does not, try-to-follow hack kicks in, and it mangles the pathspec that is used globally for history traversal to X while showing the difference between A's X and B's Y. Then we dig further to find A; at this point the global pathspec is swapped and now it is X. That makes --follow a working hack for a simplest single strand of pearls. But if you have a mergy history, e.g. ---o---A---B---C---o HEAD \ / D---E---F---G---H it can break in interesting ways. We are likely to have looked at H before looking at B and used pathspec Y while inspecting H, but after looking at B, the global pathspec is swapped to X, and then we try to look at G, F, E and D, none of which may have renamed the original X, so you would likely miss the change to the path Y you wanted to follow. To fix this, we would need to keep "what path are we following" not in the global revs->pathspec, but per the traversal paths that are currently active (e.g. when we look at C and H, it is Y, when we look at B, it is X, when we look at G, that is inherited from H and still Y, not affected by the rename at B. And then when we look at A (we need topo-order traversal to do this), it needs to notice that one child (i.e. B) has been following X while the other (i.e. D) Y, and merge the "I've been following this path" information in a sensible way (e.g. look at its own tree and see what is available, in this case X). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix segfault with git log -c --follow
On Tue, May 28, 2013 at 10:22:17AM -0700, Junio C Hamano wrote: > Clemens Buchacher writes: > > > In diff_tree_combined we make a copy of diffopts. In > > try_to_follow_renames, called via diff_tree_sha1, we free and > > re-initialize diffopts->pathspec->items. Since we did not make a deep > > copy of diffopts in diff_tree_combined, the original diffopts does not > > get the update. By the time we return from diff_tree_combined, > > rev->diffopt->pathspec->items points to an invalid memory address. We > > get a segfault next time we try to access that pathspec. > > I am not quite sure if I follow. Do you mean > > diff_tree_combined() > - makes a shallow copy of rev->diffopt > - calls diff_tree_sha1() > diff_tree_sha1() > - tries to follow rename and clobbers diffopt Right. > - tries to use the shallow copy of original rev->diffopt > that no longer is valid, which is a problem diff_tree_combined does not try to use it right away. It does return, but rev->diffopt is now invalid and the next time we do any kind of diff with it, we have a problem. > I wonder, just like we force recursive and disable external on the > copy before we use it to call diff_tree_sha1(), if we should disable > follow-renames on it. "--follow" is an option that is given to the > history traversal part and it should not play any role in getting the > pairwise diff with all parents diff_tree_combined() does. Can't parse that last sentence. In any case, I don't think disabling diff_tree_sha1 is a solution. The bug is in diff_tree_sha1 and its subfunctions, because they manipulate a data structures such that it becomes corrupt. And they do so in an obfuscated and clearly unintentional manner. So we should not blame the user for calling diff_tree_sha1 in such a way that it causes corruption. > Besides, > > - "--follow" hack lets us keep track of only one path; and Ok. Good to know it is considered a hack. The code is quite strange indeed. > - "-c" and "--cc" make sense only when dealing with a merge commit >and the path in the child may have come from different path in >parents, Sorry, I don't get it. > so I am not sure if allowing combination of "--follow -c/--cc" makes > much sense in the first place. My use-case is came up with this history: 1. Code gets added to file A. 2. File A gets renamed to B in a different branch. 3. The branches get merged, and code from (1) is removed in the merge. Later I wonder why code from (1) is gone from B even though I felt certain it had been added before. I also remember that B was renamed at some point. So I do git log -p --follow B, and it nicely shows that diff where the code was added, but no diff where the code is removed. The reason is of course, that the code was removed in the merge and that diff is not shown. And -c is usually what I do to enable showing diffs in merge commits. And if the pairwise diff can also deal with file renames, I think it absolutely does make sense to show also a three-way diff. I can't tell far away the code is from supporting anything like that. Cheers, Clemens -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix segfault with git log -c --follow
Clemens Buchacher writes: > In diff_tree_combined we make a copy of diffopts. In > try_to_follow_renames, called via diff_tree_sha1, we free and > re-initialize diffopts->pathspec->items. Since we did not make a deep > copy of diffopts in diff_tree_combined, the original diffopts does not > get the update. By the time we return from diff_tree_combined, > rev->diffopt->pathspec->items points to an invalid memory address. We > get a segfault next time we try to access that pathspec. I am not quite sure if I follow. Do you mean diff_tree_combined() - makes a shallow copy of rev->diffopt - calls diff_tree_sha1() diff_tree_sha1() - tries to follow rename and clobbers diffopt - tries to use the shallow copy of original rev->diffopt that no longer is valid, which is a problem I wonder, just like we force recursive and disable external on the copy before we use it to call diff_tree_sha1(), if we should disable follow-renames on it. "--follow" is an option that is given to the history traversal part and it should not play any role in getting the pairwise diff with all parents diff_tree_combined() does. Besides, - "--follow" hack lets us keep track of only one path; and - "-c" and "--cc" make sense only when dealing with a merge commit and the path in the child may have come from different path in parents, so I am not sure if allowing combination of "--follow -c/--cc" makes much sense in the first place. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html