When the source file is empty, the calculation of the merge score
results in a division by zero.  In the situation:

     == preimage ==             == postimage ==

     F (empty file)             F (a large file)
                                E (a new empty file)

it does not make sense to consider F->E as a rename, so it is better not
to break the pre- and post-image of F.

Bail out early in this case to avoid hitting the divide-by-zero.  This
causes the merge score to be left at zero.

Signed-off-by: John Keeping <j...@keeping.me.uk>
---
On Tue, Apr 02, 2013 at 03:41:10PM -0700, Junio C Hamano wrote:
> John Keeping <j...@keeping.me.uk> writes:
> 
> > The message for commit 6dd4b66 (Fix diffcore-break total breakage)
> > indicates that "don't bother to break small files" is wrong in some
> > cases, but it I wonder if "don't bother to break empty files" is okay.
> 
> This has a rather subtle ramifications, and we would need to think
> carefully.  "break" does two very different things, and the criteria
> you would want to use to decide to "break" a filepair depends on
> which one you are interested in.
> 
> The very original motivation of "break" was to show a patch that
> rewrites an existing file completely in a way different from the
> usual "diff -u" output, which will try to minimize the output by
> finding (rare) lines that happen to be the same between the preimage
> and postimage, intersparsed in many lines of "-" (deletion) and "+"
> (addition).  Such a change is often easier to understand when shown
> as a single block of "-" (deletion) of the entire original contents,
> followed by a single block of "+" (addition) of the entire new
> contents.
> 
> A totally separate motivation of "break" is the one Linus talks
> about in the log message of the said commit.  A path filename.h was
> moved to filename_32.h, and a new (and much smaller) filename.h was
> introduced, that "#include"s filename_32.h.  "diff -M" that pairs
> deleted files with created files to compute renames in such a case
> would not consider the original filename.h as a possible source of
> filename_32.h that was created.  You want to break modification of
> filename.h into (virtual) deletion and addition of filename.h.
> 
> For the purpose of the former, you would want not to break a file
> too aggressively.  If you started from a file with 1000 lines in it,
> and deleted 910 lines and added 10 lines to result in a file with
> 100 lines, you still have 90 lines of shared contents between the
> preimage and the postimage, and you do not want to show it as
> "delete 1000 lines and add 100 lines".  You would want to base your
> decision on how much common material exists between the two.
> 
> For the purpose of the latter, however, it is better to err on the
> side of breaking than not breaking.  After breaking a suspicious
> modification into addition and deletion, if rename comparison does
> not find a suitable destination for the virtual deletion, the broken
> halves will be merged back, so breaking too little can hurt by
> missing potential renames, but breaking too much will not.  You do
> want to break the "900 deleted out of 1000 and then added 10" case,
> as that 900 lines may have gone to another file that was created in
> the same commit.  "But we have 90 lines of common material" does not
> matter for the decision to break.
> 
> In today's code, the return value of should_break() is about the
> latter, while the score it stores in *merge_score is about the
> former.

Thanks for this explanation.  Following it through, it seems that
bailing out early when the destination is empty will do the wrong thing,
whereas letting the code carry on should result in a merge score of
MAX_SCORE and the function returning 1.

So it seems that the patch you proposed is the best way to handle this.

 diffcore-break.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/diffcore-break.c b/diffcore-break.c
index 44f8678..1d9e530 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -68,6 +68,9 @@ static int should_break(struct diff_filespec *src,
        if (max_size < MINIMUM_BREAK_SIZE)
                return 0; /* we do not break too small filepair */
 
+       if (!src->size)
+               return 0; /* we do not let empty files get renamed */
+
        if (diffcore_count_changes(src, dst,
                                   &src->cnt_data, &dst->cnt_data,
                                   0,
-- 
1.8.2.540.gf023cfe

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to