Re: git diff-tree -r -C output inexact sometimes

2012-09-20 Thread Jeff King
On Thu, Sep 20, 2012 at 11:20:31PM -0400, Cristian Tibirna wrote:

> Running the script in attachment produces a git repository in which were 
> operated a large number of file renames, in which many of the renamed files 
> (in this particular case all) have the same content but different names.
> 
> The commit data from the renaming operation (last commit in the script-
> generated history) is inexactly rendered by the command 
> 
> git diff-tree -r -C master
> 
> The logical result is correctly produced by the more restricted command
> 
> git diff-tree -r -M master
> 
> IMO for this particular last commit both the above commands should return the 
> same result.

Interesting. I get the same results from both commands. But I did have
to munge your script, as my "rename" command does not seem to work like
the one you expect in your script. So I may have misinterpreted the
intent of it.

However, I would not be surprised if one could conduct a situation in
which "-C" and "-M" produced different results. Since the content of all
the files is the same, git has to make a guess about which files match
up based on their filenames. The current heuristic is very stupid and
just tries to match basenames (e.g., moving "foo/Makefile" to
"bar/Makefile" is a better match than moving the same content to
"bar/foo.c"). But in this case, the basenames don't match at all.

By using "-C", we will typically have more rename sources available, and
we may therefore process the possible pairs in a different order. Since
our name heuristic is largely useless, our results depend on that order.

I think the real solution is to improve the name heuristic. Something
like an edit distance would make more sense (though I think it is not as
simple as an edit distance across the whole pathname, as moving a
basename across directories should probably be preferred to changing the
filename inside a directory).

Largely I think nobody has cared much because this only comes up when
you move multiple identical files. Quite often there is a minor
difference even between very similar files, and that is enough to come
up with sane results.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git diff-tree -r -C output inexact sometimes

2012-09-20 Thread Cristian Tibirna
Hello

A colleague of mine discovered an inconsistency in the functioning of 

git diff-tree -r -C

in specific conditions. As tenuous as these conditions might seem (once you 
run the script in attachment and analyse its output), please rest assured that 
it comes from a real-life case.

Running the script in attachment produces a git repository in which were 
operated a large number of file renames, in which many of the renamed files 
(in this particular case all) have the same content but different names.

The commit data from the renaming operation (last commit in the script-
generated history) is inexactly rendered by the command 

git diff-tree -r -C master

The logical result is correctly produced by the more restricted command

git diff-tree -r -M master

IMO for this particular last commit both the above commands should return the 
same result.

Note that reducing i or j in the generator script attached below makes the bug 
dissapear.

Thanks a lot for your attention.

-- 
Cristian Tibirna
KDE developer .. tibi...@kde.org .. http://www.kde.org


generate_git_tree.sh
Description: application/shellscript