Re: [PATCH v8] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible

2013-10-22 Thread Yoshioka Tsuneo
Hello Junio

Thank you for your comment.

 but this patch will show the
 source and the destination paths, both of which are truncated even
 more severely, because it always has to spend display columns for an
 extra ... (to show truncation of the source side),  =  (to show
 that it is a rename), and {,} pair (again to show that it is a
 rename). 
To be more accurate, renaming output dose not always contains { or }
if there is no common part in source and destination paths,  although
probably there are enough large possibility to include { or }.
And, in the original patch, { or } is not kept, but changed to be kept
based Thomas Rast's feedback below.
(So, there was no  possibility to have {… = …} in the original patch.)

On Oct 13, 2013, at 11:29 PM, Thomas Rast t...@thomasrast.ch wrote:
 Note that in the test, the generated line looks like this:
 
 {..._does_not_fit_in_a_single_line = .../path1  | 0
 
 I don't want to go all bikesheddey, but I think it's somewhat
 unfortunate that the elided parts do not correspond to each other.  In
 particular, I think the closing brace should not be omitted.  Perhaps
 something like this would be ideal (making it up on the spot, don't
 count characters):
 
 {...a_single_line = ..._as_the_first}/path1  | 0



And, it might be a bit nicer for me if the patch can be rejected(or ignored as 
other patches)
from the beginning if the concept does not fit anyway.
# Though I know we can know more after seeing the implementation, anyway :-)
# And, my original explanation about the patch might be not enough.

Thanks !

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 22, 2013, at 9:09 PM, Junio C Hamano gits...@pobox.com wrote:

 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 Also, I guess Junio might be suspicious to the idea to keep arrow(=) 
 itself, maybe ?
 
 I think there is no single right solution to this issue, and it
 has to boils down to the taste.
 
 When you are viewing diff --stat -M output in wide-enough medium,
 you are seeing three pieces of information: what the source path
 was, what the destination path will be, and what amount of change is
 made with the change. When the output width is too narrow to show
 these paths, with the current code, you see truncated destination
 path, possibly without the source path, but this patch will show the
 source and the destination paths, both of which are truncated even
 more severely, because it always has to spend display columns for an
 extra ... (to show truncation of the source side),  =  (to show
 that it is a rename), and {,} pair (again to show that it is a
 rename).  If the destination does not fit, the output before this
 patch would have thrown these away as part of left-truncation, to
 show the destination path as maximally as possible.  We do not have
 even half the width of the current truncated to be destination
 only output for each path.
 
 I am afraid that in the cases where the patch makes a difference,
 what happens would be that you can no longer tell what source or
 destination paths really are, because the leading directory part
 gets truncated too much, and if we didn't have this patch, at least
 you can tell what destination path is affected.  We would trade the
 guessability of at least one path (the destination) with just a
 single bit of information (an unidentifiable path got renamed to
 another unidentifiable path).
 
 I am not yet convinced that it is a good trade-off.  Especially
 given the diffstat output is not about files but more about
 contents, between an output in the extreme case the version after
 the patch needs to produce
 
   {... = ...}/controller/Makefile | 7 +++
 
 that tells us 7 lines were updated in the procedure to build some
 unknown controller by copying or renaming from the build procedure
 of some other unknown controller, and the output the current code
 would give to the same rename
 
   .}/fooGadget/controller/Makefile | 7 +++
 
 that tells us 7 lines were updated in the build procedure for the
 foo Gadget, I think the latter contains more useful information,
 even though it does lose one bit of information (there was a rename
 involved in producing this final path) compared to the version with
 the patch.
 
 So you are correct to say that I am still skeptical.
 
 In any case, the output from diff --stat -M should match the
 output from apply --stat -M, I think.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible

2013-10-19 Thread Yoshioka Tsuneo
Hello Thomas

 Can you briefly describe what you changed in v7 and v8, both compared to
 earlier versions and between v7 and v8?
On v7, sfx's basename part is tried to kept. On v7, whole sfx part is tried 
to kept.
For example, in case below:
   parent_path{sourceDirectory = 
DestinationDirectory}path1/path2//longlongFilename.txt
 On v7, this can be like:
   …{...ceDirectory = …onDirectory}.../longlongFilename.txt
On v8, it will be like:
   …{...irectory = …irectory}path1/path2/longlongFilename.txt


This change is based on the review from Junio below.
(I myself is not sure what is the better way.)

On Oct 17, 2013, at 10:29 PM, Junio C Hamano gits...@pobox.com wrote:
 I am not sure if distributing the burden of truncation equally to
 three parts so that the resulting pieces are of similar lengths is
 really a good idea.  Between these two
 
   {...SourceDirectory = ...nationDirectory}...ileThatWasMoved 
   {...ceDirectory = ...ionDirectory}nameOfTheFileThatWasMoved
 
 that attempt to show that the file nameOfTheFileThatWasMoved was
 moved from the longSourceDirectory to the DestinationDirectory, the
 latter is much more informative, I would think.


On Oct 18, 2013, at 1:38 AM, Junio C Hamano gits...@pobox.com wrote:
 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 In the [PATCH v7], I changed to keep filename part of suffix to handle
 above case, but not always keep directory part because I feel totally
 keeping all part of long suffix including directory name may cause output 
 like:
…{… = …}…ongPath1/LongPath2/nameOfTheFileThatWasMoved 
 And, above may be worse than:
   ...{...ceDirectory = …ionDirectory}.../nameOfTheFileThatWasMoved
 I think.
 
 I am not sure if I agree.
 
 Losing LongPath2 part may be more significant data loss than losing
 a single bit that says the change is a rename, as the latter may not
 quite tell us what these two directories were anyway.


Also, I guess Junio might be suspicious to the idea to keep arrow(=) itself, 
maybe ?
=
(From What's cooking in git.git (Oct 2013, #04; Fri, 18))
- diff.c: keep arrow(=) on show_stats()'s shortened filename part to make 
rename visible

Attempts to give more weight on the fact that a filepair represents
a rename than showing substring of the actual path when diffstat
lines are not wide enough.

I am not sure if that is solving a right problem, though.
=

Thanks!

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 19, 2013, at 9:24 AM, Thomas Rast t...@thomasrast.ch wrote:

 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 git diff -M --stat can detect rename and show renamed file name like
 foofoofoo = barbarbar.
 
 Before this commit, this output is shortened always by omitting left most
 part like ...foo = barbarbar. So, if the destination filename is too long,
 source filename putting left or arrow can be totally omitted like
 ...barbarbar, without including any of foofoofoo =.
 In such a case where arrow symbol is omitted, there is no way to know
 whether the file is renamed or existed in the original.
 
 Make sure there is always an arrow, like ...foo = ...bar.
 
 The output can contain curly braces('{','}') for grouping.
 So, in general, the output format is pfx{mid_a = mid_b}sfx
 
 To keep arrow(=), try to omit pfx as long as possible at first
 because later part or changing part will be the more important part.
 If it is not enough, shorten mid_a, mid_b trying to have the same
 maximum length.
 If it is not enough yet, omit sfx.
 
 Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
 Test-added-by: Thomas Rast tr...@inf.ethz.ch
 ---
 
 Can you briefly describe what you changed in v7 and v8, both compared to
 earlier versions and between v7 and v8?
 
 It would be very nice if you could always include such a patch
 changelog after the --- above.  git-am will ignore the text between
 --- and the diff, so you can write comments for the reviewers there
 without creating noise in the commit message.
 
 Also, please keep reviewers in the Cc list for future discussion/patches
 so that they will see them.
 
 -- 
 Thomas Rast
 t...@thomasrast.ch

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible

2013-10-18 Thread Yoshioka Tsuneo


git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar.

Before this commit, this output is shortened always by omitting left most
part like ...foo = barbarbar. So, if the destination filename is too long,
source filename putting left or arrow can be totally omitted like
...barbarbar, without including any of foofoofoo =.
In such a case where arrow symbol is omitted, there is no way to know
whether the file is renamed or existed in the original.

Make sure there is always an arrow, like ...foo = ...bar.

The output can contain curly braces('{','}') for grouping.
So, in general, the output format is pfx{mid_a = mid_b}sfx

To keep arrow(=), try to omit pfx as long as possible at first
because later part or changing part will be the more important part.
If it is not enough, shorten mid_a, mid_b trying to have the same
maximum length.
If it is not enough yet, omit sfx.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
Test-added-by: Thomas Rast tr...@inf.ethz.ch
---
 diff.c | 176 ++---
 t/t4001-diff-rename.sh |  12 
 2 files changed, 166 insertions(+), 22 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..69c3e17 100644
--- a/diff.c
+++ b/diff.c
@@ -1258,11 +1258,13 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 }
 
-static char *pprint_rename(const char *a, const char *b)
+static void find_common_prefix_suffix(const char *a, const char *b,
+   struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx)
 {
const char *old = a;
const char *new = b;
-   struct strbuf name = STRBUF_INIT;
int pfx_length, sfx_length;
int pfx_adjust_for_slash;
int len_a = strlen(a);
@@ -1272,10 +1274,9 @@ static char *pprint_rename(const char *a, const char *b)
int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
if (qlen_a || qlen_b) {
-   quote_c_style(a, name, NULL, 0);
-   strbuf_addstr(name,  = );
-   quote_c_style(b, name, NULL, 0);
-   return strbuf_detach(name, NULL);
+   quote_c_style(a, a_mid, NULL, 0);
+   quote_c_style(b, b_mid, NULL, 0);
+   return;
}
 
/* Find common prefix */
@@ -1322,17 +1323,138 @@ static char *pprint_rename(const char *a, const char 
*b)
if (b_midlen  0)
b_midlen = 0;
 
-   strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 7);
-   if (pfx_length + sfx_length) {
-   strbuf_add(name, a, pfx_length);
+   strbuf_add(pfx, a, pfx_length);
+   strbuf_add(a_mid, a + pfx_length, a_midlen);
+   strbuf_add(b_mid, b + pfx_length, b_midlen);
+   strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
+}
+
+/*
+ * Omit each parts to fix in name_width.
+ * Formatted string is pfx{a_mid = b_mid}sfx.
+ * At first, omit pfx as long as possible.
+ * If it is not enough, omit a_mid, b_mid trying to set the length of
+ * those 2 parts(including ...) to the same.
+ * If it is not enough yet, omit sfx.
+ * Ex:
+ * foofoofoo = barbarbar
+ *   will be like
+ * ...foo = ...bar.
+ * long_parent{foofoofoo = barbarbar}path/filename
+ *   will be like
+ * ...parent{...foofoo = ...barbar}path/filename
+ */
+static void rename_omit(struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx,
+   int name_width)
+{
+   static const char arrow[] =  = ;
+   static const char dots[] = ...;
+   int use_curly_braces = (pfx-len  0) || (sfx-len  0);
+   size_t name_len;
+   size_t max_part_len = 0;
+   size_t remainder_part_len = 0;
+   size_t left, right;
+   size_t max_sfx_len;
+   size_t sfx_len;
+
+   name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + strlen(arrow)
+   + (use_curly_braces ? 2 : 0);
+
+   if (name_len = name_width)
+   return; /* everything fits in name_width */
+
+   if (use_curly_braces) {
+   if (strlen(dots) + (name_len - pfx-len) = name_width) {
+   /*
+* Just omitting left of '{' is enough
+* Ex: ...aaa{foofoofoo = bar}file
+*/
+   strbuf_splice(pfx, 0, name_len - name_width + 
strlen(dots), dots, strlen(dots));
+   return;
+   } else {
+   if (pfx-len  strlen(dots)) {
+   /*
+* Just omitting left of '{' is not enough
+* name will be ...{SOMETHING}SOMETHING
+*/
+   strbuf_reset(pfx);
+  

Re: [PATCH v6] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-18 Thread Yoshioka Tsuneo
Hello Junio

 In the [PATCH v7], I changed to keep filename part of suffix to handle
 above case, but not always keep directory part because I feel totally
 keeping all part of long suffix including directory name may cause output 
 like:
…{… = …}…ongPath1/LongPath2/nameOfTheFileThatWasMoved 
 And, above may be worse than:
   ...{...ceDirectory = …ionDirectory}.../nameOfTheFileThatWasMoved
 I think.
 
 I am not sure if I agree.
 
 Losing LongPath2 part may be more significant data loss than losing
 a single bit that says the change is a rename, as the latter may not
 quite tell us what these two directories were anyway.
I'm not sure which is the better in general.
But anyway, I don't have strong opinion about this.
So, I just changed to keep the all of the sfx part(lator than '}').
I just sent the updated patch as [PATCH v8].

Thanks !

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 18, 2013, at 1:38 AM, Junio C Hamano gits...@pobox.com wrote:

 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 In the [PATCH v7], I changed to keep filename part of suffix to handle
 above case, but not always keep directory part because I feel totally
 keeping all part of long suffix including directory name may cause output 
 like:
…{… = …}…ongPath1/LongPath2/nameOfTheFileThatWasMoved 
 And, above may be worse than:
   ...{...ceDirectory = …ionDirectory}.../nameOfTheFileThatWasMoved
 I think.
 
 I am not sure if I agree.
 
 Losing LongPath2 part may be more significant data loss than losing
 a single bit that says the change is a rename, as the latter may not
 quite tell us what these two directories were anyway.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] diffcore-rename.c: Estimate filename similarity for rename detection

2013-10-17 Thread Yoshioka Tsuneo

On rename detection like command git diff -M ..., rename is detected
based on file similarities. This file similarities are calculated based
on the contents of file. And, if the similarities of contents are the
same, filename is taken into account.
But, the similarity of filename is calculated just whether the basename
is the same or not, and always returns just one or zero.
So, for example, if there are multiple same files in the diff-ing commits,
the result of rename detection is almost random, without taking into account
the similarity of filename.

Calculate filename similarities, and use the result to compare file similarity
in case contents similarities are the same.
Use Sorensen-Dice coefficient of bigrams in strings to calculate filename
similarities because it take into account all part of the filenames, and time
complexity is O(N), assuming N is the length of filenames.
---
 diffcore-rename.c | 81 +++
 1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..355ea6d 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -96,26 +96,51 @@ static struct diff_rename_src *register_rename_src(struct 
diff_filepair *p)
return (rename_src[first]);
 }
 
-static int basename_same(struct diff_filespec *src, struct diff_filespec *dst)
+static int estimate_filename_similarity(struct diff_filespec *src, struct 
diff_filespec *dst)
 {
-   int src_len = strlen(src-path), dst_len = strlen(dst-path);
-   while (src_len  dst_len) {
-   char c1 = src-path[--src_len];
-   char c2 = dst-path[--dst_len];
-   if (c1 != c2)
-   return 0;
-   if (c1 == '/')
-   return 1;
+   /*
+* Calculate similarity between src path and dst path using
+* Sorensen-Dice coefficient of bigrams in strings
+*/
+   const char *src_path = src-path;
+   const char *dst_path = dst-path;
+   size_t src_len = strlen(src_path);
+   size_t dst_len = strlen(dst_path);
+   static uint16_t src_bigram[256][256];
+   int i;
+   int bigrams_number = 0;
+   int similarity;
+
+   for (i=0; isrc_len; i++) {
+   int c1 = ((unsigned char*)src_path)[i];
+   int c2 = ((unsigned char*)src_path)[i + 1];
+   src_bigram[c1][c2] ++;
+   }
+   for (i=0; idst_len; i++) {
+   int c1 = ((unsigned char*)dst_path)[i];
+   int c2 = ((unsigned char*)dst_path)[i + 1];
+   if (src_bigram[c1][c2]  0) {
+   src_bigram[c1][c2] --;
+   bigrams_number ++;
+   }
+   }
+   similarity = MAX_SCORE * 2 * bigrams_number / (src_len + dst_len);
+
+/* Clean up src_bigram */
+   for (i=0; isrc_len; i++) {
+   int c1 = ((unsigned char*)src_path)[i];
+   int c2 = ((unsigned char*)src_path)[i + 1];
+   src_bigram[c1][c2] = 0;
}
-   return (!src_len || src-path[src_len - 1] == '/') 
-   (!dst_len || dst-path[dst_len - 1] == '/');
+
+   return similarity;
 }
 
 struct diff_score {
int src; /* index in rename_src */
int dst; /* index in rename_dst */
unsigned short score;
-   short name_score;
+   unsigned short name_score;
 };
 
 static int estimate_similarity(struct diff_filespec *src,
@@ -228,7 +253,7 @@ static void record_rename_pair(int dst_index, int 
src_index, int score)
  */
 static int score_compare(const void *a_, const void *b_)
 {
-   const struct diff_score *a = a_, *b = b_;
+   struct diff_score *a = (struct diff_score *)a_, *b = (struct diff_score 
*)b_;
 
/* sink the unused ones to the bottom */
if (a-dst  0)
@@ -236,8 +261,23 @@ static int score_compare(const void *a_, const void *b_)
else if (b-dst  0)
return -1;
 
-   if (a-score == b-score)
+   if (a-score == b-score){
+   if(a-score == 0)
+   return 0;
+   /* Calculate name_score only when both score is the same */
+   if(a-name_score == USHRT_MAX){
+   struct diff_filespec *two = rename_dst[a-dst].two;
+   struct diff_filespec *one = rename_src[a-src].p-one;
+   a-name_score =  estimate_filename_similarity(one, two);
+   }
+   if(b-name_score == USHRT_MAX){
+   struct diff_filespec *two = rename_dst[b-dst].two;
+   struct diff_filespec *one = rename_src[b-src].p-one;
+   b-name_score = estimate_filename_similarity(one, two);
+   }
+
return b-name_score - a-name_score;
+   }
 
return b-score - a-score;
 }
@@ -282,7 +322,7 @@ static int find_identical_files(struct file_similarity *src,
score = 

[PATCH v2] diffcore-rename.c: Estimate filename similarity for rename detection

2013-10-17 Thread Yoshioka Tsuneo

On rename detection like command git diff -M ..., rename is detected
based on file similarities. This file similarities are calculated based
on the contents of file. And, if the similarities of contents are the
same, filename is taken into account.
But, the similarity of filename is calculated just whether the basename
is the same or not, and always returns just one or zero.
So, for example, if there are multiple same files in the diff-ing commits,
the result of rename detection is almost random, without taking into account
the similarity of filename.

Calculate filename similarities, and use the result to compare file similarity
in case contents similarities are the same.
Use Sorensen-Dice coefficient of bigrams in strings to calculate filename
similarities because it take into account all part of the filenames, and time
complexity is O(N), assuming N is the length of filenames.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diffcore-rename.c | 81 +++
 1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..355ea6d 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -96,26 +96,51 @@ static struct diff_rename_src *register_rename_src(struct 
diff_filepair *p)
return (rename_src[first]);
 }
 
-static int basename_same(struct diff_filespec *src, struct diff_filespec *dst)
+static int estimate_filename_similarity(struct diff_filespec *src, struct 
diff_filespec *dst)
 {
-   int src_len = strlen(src-path), dst_len = strlen(dst-path);
-   while (src_len  dst_len) {
-   char c1 = src-path[--src_len];
-   char c2 = dst-path[--dst_len];
-   if (c1 != c2)
-   return 0;
-   if (c1 == '/')
-   return 1;
+   /*
+* Calculate similarity between src path and dst path using
+* Sorensen-Dice coefficient of bigrams in strings
+*/
+   const char *src_path = src-path;
+   const char *dst_path = dst-path;
+   size_t src_len = strlen(src_path);
+   size_t dst_len = strlen(dst_path);
+   static uint16_t src_bigram[256][256];
+   int i;
+   int bigrams_number = 0;
+   int similarity;
+
+   for (i=0; isrc_len; i++) {
+   int c1 = ((unsigned char*)src_path)[i];
+   int c2 = ((unsigned char*)src_path)[i + 1];
+   src_bigram[c1][c2] ++;
+   }
+   for (i=0; idst_len; i++) {
+   int c1 = ((unsigned char*)dst_path)[i];
+   int c2 = ((unsigned char*)dst_path)[i + 1];
+   if (src_bigram[c1][c2]  0) {
+   src_bigram[c1][c2] --;
+   bigrams_number ++;
+   }
+   }
+   similarity = MAX_SCORE * 2 * bigrams_number / (src_len + dst_len);
+
+/* Clean up src_bigram */
+   for (i=0; isrc_len; i++) {
+   int c1 = ((unsigned char*)src_path)[i];
+   int c2 = ((unsigned char*)src_path)[i + 1];
+   src_bigram[c1][c2] = 0;
}
-   return (!src_len || src-path[src_len - 1] == '/') 
-   (!dst_len || dst-path[dst_len - 1] == '/');
+
+   return similarity;
 }
 
 struct diff_score {
int src; /* index in rename_src */
int dst; /* index in rename_dst */
unsigned short score;
-   short name_score;
+   unsigned short name_score;
 };
 
 static int estimate_similarity(struct diff_filespec *src,
@@ -228,7 +253,7 @@ static void record_rename_pair(int dst_index, int 
src_index, int score)
  */
 static int score_compare(const void *a_, const void *b_)
 {
-   const struct diff_score *a = a_, *b = b_;
+   struct diff_score *a = (struct diff_score *)a_, *b = (struct diff_score 
*)b_;
 
/* sink the unused ones to the bottom */
if (a-dst  0)
@@ -236,8 +261,23 @@ static int score_compare(const void *a_, const void *b_)
else if (b-dst  0)
return -1;
 
-   if (a-score == b-score)
+   if (a-score == b-score){
+   if(a-score == 0)
+   return 0;
+   /* Calculate name_score only when both score is the same */
+   if(a-name_score == USHRT_MAX){
+   struct diff_filespec *two = rename_dst[a-dst].two;
+   struct diff_filespec *one = rename_src[a-src].p-one;
+   a-name_score =  estimate_filename_similarity(one, two);
+   }
+   if(b-name_score == USHRT_MAX){
+   struct diff_filespec *two = rename_dst[b-dst].two;
+   struct diff_filespec *one = rename_src[b-src].p-one;
+   b-name_score = estimate_filename_similarity(one, two);
+   }
+
return b-name_score - a-name_score;
+   }
 
return b-score - a-score;
 }
@@ -282,7 +322,7 @@ static int 

[PATCH v7] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible

2013-10-17 Thread Yoshioka Tsuneo

git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar.

Before this commit, this output is shortened always by omitting left most
part like ...foo = barbarbar. So, if the destination filename is too long,
source filename putting left or arrow can be totally omitted like
...barbarbar, without including any of foofoofoo =.
In such a case where arrow symbol is omitted, there is no way to know
whether the file is renamed or existed in the original.

Make sure there is always an arrow, like ...foo = ...bar.

The output can contain curly braces('{','}') for grouping.
So, in general, the output format is pfx{mid_a = mid_b}sfx

To keep arrow(=), try to omit pfx as long as possible at first
because later part or changing part will be the more important part.
If it is not enough, shorten mid_a, mid_b, and sfx trying to
have the same maximum length, but as long as filename part of sfx
is kept.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
Test-added-by: Thomas Rast tr...@inf.ethz.ch
---
 diff.c | 184 +++--
 t/t4001-diff-rename.sh |  12 
 2 files changed, 174 insertions(+), 22 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..cdf59c0 100644
--- a/diff.c
+++ b/diff.c
@@ -1258,11 +1258,13 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 }
 
-static char *pprint_rename(const char *a, const char *b)
+static void find_common_prefix_suffix(const char *a, const char *b,
+   struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx)
 {
const char *old = a;
const char *new = b;
-   struct strbuf name = STRBUF_INIT;
int pfx_length, sfx_length;
int pfx_adjust_for_slash;
int len_a = strlen(a);
@@ -1272,10 +1274,9 @@ static char *pprint_rename(const char *a, const char *b)
int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
if (qlen_a || qlen_b) {
-   quote_c_style(a, name, NULL, 0);
-   strbuf_addstr(name,  = );
-   quote_c_style(b, name, NULL, 0);
-   return strbuf_detach(name, NULL);
+   quote_c_style(a, a_mid, NULL, 0);
+   quote_c_style(b, b_mid, NULL, 0);
+   return;
}
 
/* Find common prefix */
@@ -1322,17 +1323,146 @@ static char *pprint_rename(const char *a, const char 
*b)
if (b_midlen  0)
b_midlen = 0;
 
-   strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 7);
-   if (pfx_length + sfx_length) {
-   strbuf_add(name, a, pfx_length);
+   strbuf_add(pfx, a, pfx_length);
+   strbuf_add(a_mid, a + pfx_length, a_midlen);
+   strbuf_add(b_mid, b + pfx_length, b_midlen);
+   strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
+}
+
+/*
+ * Omit each parts to fix in name_width.
+ * Formatted string is pfx{a_mid = b_mid}sfx.
+ * At first, omit pfx as long as possible.
+ * If it is not enough, omit a_mid, b_mid, sfx by tring to set the 
length of
+ * those 3 parts(including ...) to the same, but keeping filename part of 
sfx.
+ * Ex:
+ * foofoofoo = barbarbar
+ *   will be like
+ * ...foo = ...bar.
+ * long_parent{foofoofoo = barbarbar}path/filename
+ *   will be like
+ * ...parent{...foofoo = ...barbar}.../filename
+ */
+static void rename_omit(struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx,
+   int name_width)
+{
+   static const char arrow[] =  = ;
+   static const char dots[] = ...;
+   int use_curly_braces = (pfx-len  0) || (sfx-len  0);
+   size_t name_len;
+   size_t max_part_len = 0;
+   size_t remainder_part_len = 0;
+   size_t left, right;
+   size_t sfx_minlen;
+   char *sfx_last_slash;
+   size_t max_sfx_len;
+
+   name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + strlen(arrow)
+   + (use_curly_braces ? 2 : 0);
+
+   if (name_len = name_width)
+   return; /* everything fits in name_width */
+
+   if (use_curly_braces) {
+   if (strlen(dots) + (name_len - pfx-len) = name_width) {
+   /*
+* Just omitting left of '{' is enough
+* Ex: ...aaa{foofoofoo = bar}file
+*/
+   strbuf_splice(pfx, 0, name_len - name_width + 
strlen(dots), dots, strlen(dots));
+   return;
+   } else {
+   if (pfx-len  strlen(dots)) {
+   /*
+* Just omitting left of '{' is not enough
+* name will be ...{SOMETHING}SOMETHING
+*/
+   

Re: [PATCH v6] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-17 Thread Yoshioka Tsuneo
Hello Junio

Thank you very much for the reviewing.
I try to fix the issues, and posted the updated patch as [PATCH v7].

 I am not sure if distributing the burden of truncation equally to
 three parts so that the resulting pieces are of similar lengths is
 really a good idea.  Between these two
 
   {...SourceDirectory = ...nationDirectory}...ileThatWasMoved 
   {...ceDirectory = ...ionDirectory}nameOfTheFileThatWasMoved
 
 that attempt to show that the file nameOfTheFileThatWasMoved was
 moved from the longSourceDirectory to the DestinationDirectory, the
 latter is much more informative, I would think.
In the [PATCH v7], I changed to keep filename part of suffix to handle
above case, but not always keep directory part because I feel totally
keeping all part of long suffix including directory name may cause output like:
…{… = …}…ongPath1/LongPath2/nameOfTheFileThatWasMoved 
And, above may be worse than:
   ...{...ceDirectory = …ionDirectory}.../nameOfTheFileThatWasMoved
I think.

Thank you !

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 17, 2013, at 10:29 PM, Junio C Hamano gits...@pobox.com wrote:

 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 git diff -M --stat can detect rename and show renamed file name like
 foofoofoo = barbarbar.
 Before this commit, this output is shortened always by omitting left most
 part like ...foo = barbarbar. So, if the destination filename is too long,
 source filename putting left or arrow can be totally omitted like
 ...barbarbar, without including any of foofoofoo =.
 In such a case where arrow symbol is omitted, there is no way to know
 whether the file is renamed or existed in the original.
 Make sure there is always an arrow, like ...foo = ...bar.
 The output can contain curly braces('{','}') for grouping.
 So, in general, the output format is pfx{mid_a = mid_b}sfx
 To keep arrow(=), try to omit pfx as long as possible at first
 because later part or changing part will be the more important part.
 If it is not enough, shorten mid_a, mid_b, and sfx trying to
 have the maximum length the same because those will be equally important.
 
 I somehow find this solid wall of text extremely hard to
 read. Adding a blank line as a paragraph break may make it easier to
 read, perhaps.
 
 Also it is customary in our history to omit the full-stop from the
 patch title on the Subject: line.
 
 +name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + strlen(arrow)
 ++ (use_curly_braces ? 2 : 0);
 +
 +if (name_len = name_width) {
 +/* Everthing fits in name_width */
 +return;
 +}
 
 Logic up to this point seems good; drop {} around a single statement
 return;, i.e.
 
   if (name_len = name_width)
   return; /* everything fits */
 
 +} else {
 +if (pfx-len  strlen(dots)) {
 +/*
 + * Just omitting left of '{' is not enough
 + * name will be ...{SOMETHING}SOMETHING
 + */
 +strbuf_reset(pfx);
 +strbuf_addstr(pfx, dots);
 +}
 
 (mental note) ... otherwise, i.e. with a short common prefix, the
 final result will be ab{SOMETHING}SOMETHING, which is also fine
 for the purpose of the remainder of this function.
 
 +}
 +}
 +
 +/* available length for a_mid, b_mid and sfx */
 +len = name_width - strlen(arrow) - (use_curly_braces ? 2 : 0);
 +
 +/* a_mid, b_mid, sfx will be have the same max, including 
 ellipsis(...). */
 +part_length[0] = a_mid-len;
 +part_length[1] = b_mid-len;
 +part_length[2] = sfx-len;
 +
 +qsort(part_length, sizeof(part_length)/sizeof(part_length[0]), 
 sizeof(part_length[0])
 +  , compare_size_t_descending_order);
 
 In our code, comma does not come at the beginning of continued
 line.
 
 +if (part_length[1] + part_length[1] + part_length[2] = len) {
 +/*
 + * {...foofoo = barbar}file
 + * There is only one omitted part.
 + */
 +max_part_len = len - part_length[1] - part_length[2];
 
 It would be clearer to explicitly set remainder to zero here, and
 omit the initialization of the variable.  That would make what the
 three parts of if/elseif/else do more consistent.
 
 +} else if (part_length[2] + part_length[2] + part_length[2] = len) {
 +/*
 + * {...foofoo = ...barbar}file
 + * There are 2 omitted parts.
 + */
 +max_part_len = (len - part_length[2]) / 2;
 +remainder_part_len = (len - part_length[2]) - max_part_len * 2;
 +} else {
 +/*
 + * {...ofoo = ...rbar}...file
 + * There are 3 omitted parts.
 + */
 +max_part_len = len / 3;
 +remainder_part_len = len - (max_part_len) * 3

Re: [PATCH v5] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-16 Thread Yoshioka Tsuneo
Hello Junio, Keshav

Thank you very much for your detailed reviewing.
I just tried to update the patch and posted as [PATCH v6].

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 16, 2013, at 1:54 AM, Junio C Hamano gits...@pobox.com wrote:

 Yoshioka Tsuneo yoshiokatsu...@gmail.com writes:
 
 git diff -M --stat can detect rename and show renamed file name like
 foofoofoo = barbarbar. But if destination filename is long, the line
 is shortened like ...barbarbar so there is no way to know whether the
 file is renamed or existed in the source commit.
 
 Is destination filename more special than the source filename?
 Perhaps s/if destination filename is/if filenames are/?
 
   Note: I do not want you to reroll using the suggested
   wording without explanation; it may be possible that I am
   missing something obvious and do not understand why you
   singled out destination, in which case I'd rather see it
   explained better in the log message than the potentially
   suboptimal suggestion I made in the review without
   understanding the issue. Of course, it is possible that you
   want to do the same when source is overlong, in which case
   you can just say Yeah, you're right; will reroll.
 
The above applies to all the other comments in this message.
 
 Also s/source commit/original/.  You may not be comparing two
 commits after all.
 
 Make sure there is always an arrow, like ...foo = ...bar.
 The output can contains curly braces('{','}') for grouping.
 
 s/contains/contain/;
 
 So, in general, the outpu format is pfx{mid_a = mid_b}sfx
 
 s/outpu/t/;
 
 To keep arrow(=), try to omit pfx as long as possible at first
 because later part or changing part will be the more important part.
 If it is not enough, shorten mid_a, mid_b, and sfx trying to
 have the maximum length the same because those will be equaly important.
 
 A sound reasoning.
 
 Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
 Test-added-by: Thomas Rast tr...@inf.ethz.ch
 ---
 diff.c | 187 
 +++--
 t/t4001-diff-rename.sh |  12 
 2 files changed, 177 insertions(+), 22 deletions(-)
 
 diff --git a/diff.c b/diff.c
 index a04a34d..cf50807 100644
 --- a/diff.c
 +++ b/diff.c
 @@ -1258,11 +1258,12 @@ static void fn_out_consume(void *priv, char *line, 
 unsigned long len)
  }
 }
 
 -static char *pprint_rename(const char *a, const char *b)
 +static void pprint_rename_find_common_prefix_suffix(const char *a, const 
 char *b
 +
 , struct strbuf *pfx, struct strbuf *a_mid
 +
 , struct strbuf *b_mid, struct strbuf *sfx)
 
 What kind of line splitting is this?
 
 I think the real issue is that the function name is overly long, but
 aside from that,
 
 - comma comes at the end of the line, not at the beginning of the
   next line;
 
 - the second and subsequent lines are indented, but not more than
   the usual line width (align with the first letter inside the
   opening parenthesis of the first line);
 
 - a_mid and b_mid are more alike than pfx and a_mid.
 
 so I would expect to see it more like:
 
 static void abbrev_rename(const char *a, const char *b,
 struct strbuf *pfx,
 struct strbuf *a_mid, struct strbuf *b_mid,
 struct strbuf *sfx)
 
 Note that the suggested name does not say pprint, because in your
 version of this file, the code around here is no longer doing any
 printing.  The caller does so after using this function to decide
 how to abbreviate renames, so naming the helper function after what
 it does (e.g. abbreviate renames) is more appropriate.
 
 {
  const char *old = a;
  const char *new = b;
 -struct strbuf name = STRBUF_INIT;
  int pfx_length, sfx_length;
  int pfx_adjust_for_slash;
  int len_a = strlen(a);
 @@ -1272,10 +1273,9 @@ static char *pprint_rename(const char *a, const char 
 *b)
  int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
  if (qlen_a || qlen_b) {
 -quote_c_style(a, name, NULL, 0);
 -strbuf_addstr(name,  = );
 -quote_c_style(b, name, NULL, 0);
 -return strbuf_detach(name, NULL);
 +quote_c_style(a, a_mid, NULL, 0);
 +quote_c_style(b, b_mid, NULL, 0);
 +return;
  }
 
  /* Find common prefix */
 @@ -1321,18 +1321,151 @@ static char *pprint_rename(const char *a, const 
 char *b)
  a_midlen = 0;
  if (b_midlen  0)
  b_midlen = 0;
 +
 
 Trailing whitespace (there are many others you added to this file; I
 won't bother to point out all of them).
 
 +strbuf_add(pfx, a, pfx_length);
 +strbuf_add(a_mid, a + pfx_length, a_midlen);
 +strbuf_add(b_mid, b

[PATCH v6] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-16 Thread Yoshioka Tsuneo

git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar.
Before this commit, this output is shortened always by omitting left most
part like ...foo = barbarbar. So, if the destination filename is too long,
source filename putting left or arrow can be totally omitted like
...barbarbar, without including any of foofoofoo =.
In such a case where arrow symbol is omitted, there is no way to know
whether the file is renamed or existed in the original.
Make sure there is always an arrow, like ...foo = ...bar.
The output can contain curly braces('{','}') for grouping.
So, in general, the output format is pfx{mid_a = mid_b}sfx
To keep arrow(=), try to omit pfx as long as possible at first
because later part or changing part will be the more important part.
If it is not enough, shorten mid_a, mid_b, and sfx trying to
have the maximum length the same because those will be equally important.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
Test-added-by: Thomas Rast tr...@inf.ethz.ch
---
 diff.c | 180 +++--
 t/t4001-diff-rename.sh |  12 
 2 files changed, 170 insertions(+), 22 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..afe6a36 100644
--- a/diff.c
+++ b/diff.c
@@ -1258,11 +1258,13 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 }
 
-static char *pprint_rename(const char *a, const char *b)
+static void find_common_prefix_suffix(const char *a, const char *b,
+   struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx)
 {
const char *old = a;
const char *new = b;
-   struct strbuf name = STRBUF_INIT;
int pfx_length, sfx_length;
int pfx_adjust_for_slash;
int len_a = strlen(a);
@@ -1272,10 +1274,9 @@ static char *pprint_rename(const char *a, const char *b)
int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
if (qlen_a || qlen_b) {
-   quote_c_style(a, name, NULL, 0);
-   strbuf_addstr(name,  = );
-   quote_c_style(b, name, NULL, 0);
-   return strbuf_detach(name, NULL);
+   quote_c_style(a, a_mid, NULL, 0);
+   quote_c_style(b, b_mid, NULL, 0);
+   return;
}
 
/* Find common prefix */
@@ -1322,17 +1323,142 @@ static char *pprint_rename(const char *a, const char 
*b)
if (b_midlen  0)
b_midlen = 0;
 
-   strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 7);
-   if (pfx_length + sfx_length) {
-   strbuf_add(name, a, pfx_length);
+   strbuf_add(pfx, a, pfx_length);
+   strbuf_add(a_mid, a + pfx_length, a_midlen);
+   strbuf_add(b_mid, b + pfx_length, b_midlen);
+   strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
+}
+
+static int compare_size_t_descending_order(const void *left, const void *right)
+{
+   size_t left_val = *(size_t *)left;
+   size_t right_val = *(size_t *)right;
+   return (int)(right_val - left_val);
+}
+
+/*
+ * Omit each parts to fix in name_width.
+ * Formatted string is pfx{a_mid = b_mid}sfx.
+ * At first, omit pfx as long as possible.
+ * If it is not enough, omit a_mid, b_mid, sfx by tring to set the 
length of
+ * those 3 parts(including ...) to the same.
+ * Ex:
+ * foofoofoo = barbarbar
+ *   will be like
+ * ...foo = ...bar.
+ * long_parent{foofoofoo = barbarbar}longfilename
+ *   will be like
+ * ...parent{...foofoo = ...barbar}...lename
+ */
+static void rename_omit(struct strbuf *pfx,
+   struct strbuf *a_mid, struct strbuf *b_mid,
+   struct strbuf *sfx,
+   int name_width)
+{
+   static const char arrow[] =  = ;
+   static const char dots[] = ...;
+   int use_curly_braces = (pfx-len  0) || (sfx-len  0);
+   size_t name_len;
+   size_t len;
+   size_t part_length[3];
+   size_t max_part_len = 0;
+   size_t remainder_part_len = 0;
+
+   name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + strlen(arrow)
+   + (use_curly_braces ? 2 : 0);
+
+   if (name_len = name_width) {
+   /* Everthing fits in name_width */
+   return;
+   }
+
+   if (use_curly_braces) {
+   if (strlen(dots) + (name_len - pfx-len) = name_width) {
+   /*
+* Just omitting left of '{' is enough
+* Ex: ...aaa{foofoofoo = bar}file
+*/
+   strbuf_splice(pfx, 0, name_len - name_width + 
strlen(dots), dots, strlen(dots));
+   return;
+   } else {
+   if (pfx-len  strlen(dots)) {
+   /*
+* Just omitting left of '{' is not 

Re: [PATCH v3] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-15 Thread Yoshioka Tsuneo
Hello Duy

Thank you very much your suggestion.
As you suggested, I tried to reuse intermediate result of pprint_rename(), 
instead of
parsing the output again.
I just posted the new patch as PATCH v4

Thanks !

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 14, 2013, at 10:04 PM, Duy Nguyen pclo...@gmail.com wrote:

 On Sun, Oct 13, 2013 at 3:48 AM, Yoshioka Tsuneo
 yoshiokatsu...@gmail.com wrote:
 git diff -M --stat can detect rename and show renamed file name like
 foofoofoo = barbarbar, but if destination filename is long the line
 is shortened like ...barbarbar so there is no way to know whether the
 file is renamed or existed in the source commit.
 This commit makes it visible like ...foo = ...bar.
 
 Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
 ---
 diff.c | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)
 
 diff --git a/diff.c b/diff.c
 index a04a34d..3aeaf3e 100644
 --- a/diff.c
 +++ b/diff.c
 @@ -1643,13 +1643,57 @@ static void show_stats(struct diffstat_t *data, 
 struct diff_options *options)
len = name_width;
name_len = strlen(name);
if (name_width  name_len) {
 -   char *slash;
 -   prefix = ...;
 -   len -= 3;
 -   name += name_len - len;
 -   slash = strchr(name, '/');
 -   if (slash)
 -   name = slash;
 +   char *arrow = strstr(name,  = );
 +   if (arrow) {
 
 This looks iffy. What if  =  is part of the path name?
 file-is_renamed would be a more reliable sign. In that case I think
 you just need an ellipsis version of pprint_rename() (i.e. drop the
 result of previous pprint_rename() on the floor and create a new
 string with ... and  =  in your pprint_ellipsis_rename or
 something)
 
 +   int prefix_len = (name_width - 4) / 2;
 +   int f_omit;
 +   int f_brace = 0;
 +   char *pre_arrow = alloca(name_width + 10);
 +   char *post_arrow = arrow + 4;
 +   char *prefix_buf = alloca(name_width + 10);
 +   char *pre_arrow_slash = NULL;
 +
 +   if (arrow - name  prefix_len) {
 +   prefix_len = (int)(arrow - name);
 +   f_omit = 0;
 +   } else {
 +   prefix_len -= 3;
 +   f_omit = 1;
 +   if (name[0] == '{') {
 +   prefix_len -= 1;
 +   f_brace = 1;
 +   }
 +   }
 +   prefix_len = ((prefix_len = 0) ? prefix_len 
 : 0);
 +   strncpy(pre_arrow, arrow - prefix_len, 
 prefix_len);
 +   pre_arrow[prefix_len] = '\0';
 +   pre_arrow_slash = strchr(pre_arrow, '/');
 +   if (f_omit  pre_arrow_slash)
 +   pre_arrow = pre_arrow_slash;
 +   sprintf(prefix_buf, %s%s%s = , (f_brace ? 
 { : ), (f_omit ? ... : ), pre_arrow);
 +   prefix = prefix_buf;
 +
 +   if (strlen(post_arrow)  name_width - 
 strlen(prefix)) {
 +   char *post_arrow_slash = NULL;
 +
 +   post_arrow += strlen(post_arrow) - 
 (name_width - strlen(prefix) - 3);
 +   strcat(prefix_buf, ...);
 +   post_arrow_slash = 
 strchr(post_arrow, '/');
 +   if (post_arrow_slash)
 +   post_arrow = 
 post_arrow_slash;
 +   name = post_arrow;
 +   name_len = (int) (name_width - 
 strlen(prefix));
 +   }
 +   len -= strlen(prefix);
 +   } else {
 +   char *slash = NULL;
 +   prefix = ...;
 +   len -= 3;
 +   name += name_len - len;
 +   slash = strchr(name, '/');
 +   if (slash)
 +   name = slash;
 +   }
}
 
if (file-is_binary) {
 --
 1.8.4.475.g867697c
 
 
 --
 To unsubscribe from this list

[PATCH v5] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-15 Thread Yoshioka Tsuneo

git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar. But if destination filename is long, the line
is shortened like ...barbarbar so there is no way to know whether the
file is renamed or existed in the source commit.
Make sure there is always an arrow, like ...foo = ...bar.
The output can contains curly braces('{','}') for grouping.
So, in general, the outpu format is pfx{mid_a = mid_b}sfx
To keep arrow(=), try to omit pfx as long as possible at first
because later part or changing part will be the more important part.
If it is not enough, shorten mid_a, mid_b, and sfx trying to
have the maximum length the same because those will be equaly important.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
Test-added-by: Thomas Rast tr...@inf.ethz.ch
---
 diff.c | 187 +++--
 t/t4001-diff-rename.sh |  12 
 2 files changed, 177 insertions(+), 22 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..cf50807 100644
--- a/diff.c
+++ b/diff.c
@@ -1258,11 +1258,12 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
}
 }
 
-static char *pprint_rename(const char *a, const char *b)
+static void pprint_rename_find_common_prefix_suffix(const char *a, const char 
*b
+   
, struct strbuf *pfx, struct strbuf *a_mid
+   
, struct strbuf *b_mid, struct strbuf *sfx)
 {
const char *old = a;
const char *new = b;
-   struct strbuf name = STRBUF_INIT;
int pfx_length, sfx_length;
int pfx_adjust_for_slash;
int len_a = strlen(a);
@@ -1272,10 +1273,9 @@ static char *pprint_rename(const char *a, const char *b)
int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
if (qlen_a || qlen_b) {
-   quote_c_style(a, name, NULL, 0);
-   strbuf_addstr(name,  = );
-   quote_c_style(b, name, NULL, 0);
-   return strbuf_detach(name, NULL);
+   quote_c_style(a, a_mid, NULL, 0);
+   quote_c_style(b, b_mid, NULL, 0);
+   return;
}
 
/* Find common prefix */
@@ -1321,18 +1321,151 @@ static char *pprint_rename(const char *a, const char 
*b)
a_midlen = 0;
if (b_midlen  0)
b_midlen = 0;
+   
+   strbuf_add(pfx, a, pfx_length);
+   strbuf_add(a_mid, a + pfx_length, a_midlen);
+   strbuf_add(b_mid, b + pfx_length, b_midlen);
+   strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
+}
+
+/*
+ * Omit each parts to fix in name_width.
+ * Formatted string is pfx{a_mid = b_mid}sfx.
+ * At first, omit pfx as long as possible.
+ * If it is not enough, omit a_mid, b_mid, sfx by tring to set the 
length of
+ * those 3 parts(including ...) to the same.
+ * Ex:
+ * foofoofoo = barbarbar
+ *   will be like
+ * ...foo = ...bar.
+ * long_parent{foofoofoo = barbarbar}longfilename
+ *   will be like
+ * ...parent{...foofoo = ...barbar}...lename
+ */
+static void pprint_rename_omit(struct strbuf *pfx, struct strbuf *a_mid, 
struct strbuf *b_mid
+  , struct strbuf 
*sfx, int name_width)
+{
+
+#define ARROW  = 
+#define ELLIPSIS ...
+#define swap(a, b) myswap((a), (b), sizeof(a))
+   
+#define myswap(a, b, size) do {\
+unsigned char mytmp[size]; \
+memcpy(mytmp, a, size);   \
+memcpy(a, b, size);  \
+memcpy(b, mytmp, size);   \
+} while (0)
+
+   int use_curly_braces = (pfx-len  0) || (sfx-len  0);
+   size_t name_len;
+   size_t len;
+   size_t part_lengths[4];
+   size_t max_part_len = 0;
+   size_t remainder_part_len = 0;
+   int i, j;
+
+   name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + strlen(ARROW)
+   + (use_curly_braces ? 2 : 0);
+   
+   if (name_len = name_width) {
+   /* Everthing fits in name_width */
+   return;
+   }
+   
+   if (use_curly_braces) {
+   if (strlen(ELLIPSIS) + (name_len - pfx-len) = name_width) {
+   /*
+Just omitting left of '{' is enough
+Ex: ...aaa{foofoofoo = bar}file
+*/
+   strbuf_splice(pfx, name_len - pfx-len, name_width - 
(name_len - pfx-len), ELLIPSIS, strlen(ELLIPSIS));
+   return;
+   } else {
+   if (pfx-len  strlen(ELLIPSIS)) {
+   /*
+Just omitting left of '{' is not enough
+name will be ...{SOMETHING}SOMETHING
+*/
+   strbuf_reset(pfx);
+  

Re: [PATCH v4] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-15 Thread Yoshioka Tsuneo
Hello Felipe

Thank you for pointing out the style issue again.
I just fixed it and posted as [PATCH v5].

Thanks!

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 15, 2013, at 1:07 PM, Felipe Contreras felipe.contre...@gmail.com 
wrote:

 On Tue, Oct 15, 2013 at 4:45 AM, Yoshioka Tsuneo
 yoshiokatsu...@gmail.com wrote:
 
 git diff -M --stat can detect rename and show renamed file name like
 foofoofoo = barbarbar. But if destination filename is long, the line
 is shortened like ...barbarbar so there is no way to know whether the
 file is renamed or existed in the source commit.
 Make sure there is always an arrow, like ...foo = ...bar.
 The output can contains curly braces('{','}') for grouping.
 So, in general, the outpu format is pfx{mid_a = mid_b}sfx
 To keep arrow(=), try to omit pfx as long as possible at first
 because later part or changing part will be the more important part.
 If it is not enough, shorten mid_a, mid_b, and sfx trying to
 have the maximum length the same because those will be equaly important.
 
 This has similar style issues as v1.
 
 -static char *pprint_rename(const char *a, const char *b)
 +static void pprint_rename_find_common_prefix_suffix(const char *a, const 
 char *b, struct strbuf *pfx, struct strbuf *a_mid, struct strbuf *b_mid, 
 struct strbuf *sfx)
 {
const char *old = a;
const char *new = b;
 -   struct strbuf name = STRBUF_INIT;
int pfx_length, sfx_length;
int pfx_adjust_for_slash;
int len_a = strlen(a);
 @@ -1272,10 +1271,9 @@ static char *pprint_rename(const char *a, const char 
 *b)
int qlen_b = quote_c_style(b, NULL, NULL, 0);
 
if (qlen_a || qlen_b) {
 -   quote_c_style(a, name, NULL, 0);
 -   strbuf_addstr(name,  = );
 -   quote_c_style(b, name, NULL, 0);
 -   return strbuf_detach(name, NULL);
 +   quote_c_style(a, a_mid, NULL, 0);
 +   quote_c_style(b, b_mid, NULL, 0);
 +   return;
}
 
/* Find common prefix */
 @@ -1321,18 +1319,149 @@ static char *pprint_rename(const char *a, const 
 char *b)
a_midlen = 0;
if (b_midlen  0)
b_midlen = 0;
 +
 +   strbuf_add(pfx, a, pfx_length);
 +   strbuf_add(a_mid, a + pfx_length, a_midlen);
 +   strbuf_add(b_mid, b + pfx_length, b_midlen);
 +   strbuf_add(sfx, a + len_a - sfx_length, sfx_length);
 +}
 
 -   strbuf_grow(name, pfx_length + a_midlen + b_midlen + sfx_length + 
 7);
 -   if (pfx_length + sfx_length) {
 -   strbuf_add(name, a, pfx_length);
 +/*
 + * Omit each parts to fix in name_width.
 + * Formatted string is pfx{a_mid = b_mid}sfx.
 + * At first, omit pfx as long as possible.
 + * If it is not enough, omit a_mid, b_mid, sfx by tring to set the 
 length of
 + * those 3 parts(including ...) to the same.
 + * Ex:
 + * foofoofoo = barbarbar
 + *   will be like
 + * ...foo = ...bar.
 + * long_parent{foofoofoo = barbarbar}longfilename
 + *   will be like
 + * ...parent{...foofoo = ...barbar}...lename
 + */
 +static void pprint_rename_omit(struct strbuf *pfx, struct strbuf *a_mid, 
 struct strbuf *b_mid, struct strbuf *sfx, int name_width)
 
 Seems like this line needs to be broken.
 
 +{
 +
 +#define ARROW  = 
 +#define ELLIPSIS ...
 +#define swap(a,b) myswap((a),(b),sizeof(a))
 
 I'm not entirely sure, but I think this should be:
 
 #define swap(a, b) myswap((a), (b), sizeof(a))
 
 +
 +#define myswap(a, b, size) do {\
 +unsigned char mytmp[size]; \
 +memcpy(mytmp, a, size);   \
 +memcpy(a, b, size);  \
 +memcpy(b, mytmp, size);   \
 +} while (0)
 +
 +   int use_curly_braces = (pfx-len  0) || (sfx-len  0);
 +   size_t name_len;
 +   size_t len;
 +   size_t part_lengths[4];
 +   size_t max_part_len = 0;
 +   size_t remainder_part_len = 0;
 +   int i, j;
 +
 +   name_len = pfx-len + a_mid-len + b_mid-len + sfx-len + 
 strlen(ARROW) + (use_curly_braces?2:0);
 +
 +   if (name_len = name_width){
 
 if () {
 
 +   /* Everthing fits in name_width */
 +   return;
 +   }
 +
 +   if(use_curly_braces){
 
 Ditto.
 
 +   if(strlen(ELLIPSIS) + (name_len - pfx-len) = name_width){
 
 Ditto.
 
 +   /*
 +Just omitting left of '{' is enough
 +Ex: ...aaa{foofoofoo = bar}file
 +*/
 +   strbuf_splice(pfx, name_len - pfx-len, name_width - 
 (name_len - pfx-len), ELLIPSIS, strlen(ELLIPSIS));
 +   return;
 +   }else{
 
 } else {
 
 +   if (pfx-len  strlen(ELLIPSIS)) {
 +   /*
 +Just omitting left of '{' is not enough
 +name will be ...{SOMETHING}SOMETHING

[PATCH v3] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-12 Thread Yoshioka Tsuneo
git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar, but if destination filename is long the line
is shortened like ...barbarbar so there is no way to know whether the
file is renamed or existed in the source commit.
This commit makes it visible like ...foo = ...bar.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diff.c | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..3aeaf3e 100644
--- a/diff.c
+++ b/diff.c
@@ -1643,13 +1643,57 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
len = name_width;
name_len = strlen(name);
if (name_width  name_len) {
-   char *slash;
-   prefix = ...;
-   len -= 3;
-   name += name_len - len;
-   slash = strchr(name, '/');
-   if (slash)
-   name = slash;
+   char *arrow = strstr(name,  = );
+   if (arrow) {
+   int prefix_len = (name_width - 4) / 2;
+   int f_omit;
+   int f_brace = 0;
+   char *pre_arrow = alloca(name_width + 10);
+   char *post_arrow = arrow + 4;
+   char *prefix_buf = alloca(name_width + 10);
+   char *pre_arrow_slash = NULL;
+
+   if (arrow - name  prefix_len) {
+   prefix_len = (int)(arrow - name);
+   f_omit = 0;
+   } else {
+   prefix_len -= 3;
+   f_omit = 1;
+   if (name[0] == '{') {
+   prefix_len -= 1;
+   f_brace = 1;
+   }
+   }
+   prefix_len = ((prefix_len = 0) ? prefix_len : 
0);
+   strncpy(pre_arrow, arrow - prefix_len, 
prefix_len);
+   pre_arrow[prefix_len] = '\0';
+   pre_arrow_slash = strchr(pre_arrow, '/');
+   if (f_omit  pre_arrow_slash)
+   pre_arrow = pre_arrow_slash;
+   sprintf(prefix_buf, %s%s%s = , (f_brace ? 
{ : ), (f_omit ? ... : ), pre_arrow);
+   prefix = prefix_buf;
+
+   if (strlen(post_arrow)  name_width - 
strlen(prefix)) {
+   char *post_arrow_slash = NULL;
+
+   post_arrow += strlen(post_arrow) - 
(name_width - strlen(prefix) - 3);
+   strcat(prefix_buf, ...);
+   post_arrow_slash = strchr(post_arrow, 
'/');
+   if (post_arrow_slash)
+   post_arrow = post_arrow_slash;
+   name = post_arrow;
+   name_len = (int) (name_width - 
strlen(prefix));
+   }
+   len -= strlen(prefix);
+   } else {
+   char *slash = NULL;
+   prefix = ...;
+   len -= 3;
+   name += name_len - len;
+   slash = strchr(name, '/');
+   if (slash)
+   name = slash;
+   }
}
 
if (file-is_binary) {
-- 
1.8.4.475.g867697c


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-12 Thread Yoshioka Tsuneo
git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar, but if destination filename is long the line
is shortened like ...barbarbar so there is no way to know whether the
file is renamed or existed in the source commit.
This commit makes it visible like ...foo = ...bar.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diff.c | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..3aeaf3e 100644
--- a/diff.c
+++ b/diff.c
@@ -1643,13 +1643,57 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
len = name_width;
name_len = strlen(name);
if (name_width  name_len) {
-   char *slash;
-   prefix = ...;
-   len -= 3;
-   name += name_len - len;
-   slash = strchr(name, '/');
-   if (slash)
-   name = slash;
+   char *arrow = strstr(name,  = );
+   if (arrow) {
+   int prefix_len = (name_width - 4) / 2;
+   int f_omit;
+   int f_brace = 0;
+   char *pre_arrow = alloca(name_width + 10);
+   char *post_arrow = arrow + 4;
+   char *prefix_buf = alloca(name_width + 10);
+   char *pre_arrow_slash = NULL;
+
+   if (arrow - name  prefix_len) {
+   prefix_len = (int)(arrow - name);
+   f_omit = 0;
+   } else {
+   prefix_len -= 3;
+   f_omit = 1;
+   if (name[0] == '{') {
+   prefix_len -= 1;
+   f_brace = 1;
+   }
+   }
+   prefix_len = ((prefix_len = 0) ? prefix_len : 
0);
+   strncpy(pre_arrow, arrow - prefix_len, 
prefix_len);
+   pre_arrow[prefix_len] = '\0';
+   pre_arrow_slash = strchr(pre_arrow, '/');
+   if (f_omit  pre_arrow_slash)
+   pre_arrow = pre_arrow_slash;
+   sprintf(prefix_buf, %s%s%s = , (f_brace ? 
{ : ), (f_omit ? ... : ), pre_arrow);
+   prefix = prefix_buf;
+
+   if (strlen(post_arrow)  name_width - 
strlen(prefix)) {
+   char *post_arrow_slash = NULL;
+
+   post_arrow += strlen(post_arrow) - 
(name_width - strlen(prefix) - 3);
+   strcat(prefix_buf, ...);
+   post_arrow_slash = strchr(post_arrow, 
'/');
+   if (post_arrow_slash)
+   post_arrow = post_arrow_slash;
+   name = post_arrow;
+   name_len = (int) (name_width - 
strlen(prefix));
+   }
+   len -= strlen(prefix);
+   } else {
+   char *slash = NULL;
+   prefix = ...;
+   len -= 3;
+   name += name_len - len;
+   slash = strchr(name, '/');
+   if (slash)
+   name = slash;
+   }
}
 
if (file-is_binary) {
-- 
1.8.4.475.g867697c


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [spf:guess,mismatch] [PATCH v2] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-12 Thread Yoshioka Tsuneo
Hello

On Oct 12, 2013, at 8:35 AM, Keshav Kini keshav.k...@gmail.com wrote:

 Sam Vilain s...@vilain.net writes:
 
 On 10/11/2013 06:07 AM, Yoshioka Tsuneo wrote:
 +   prefix_len = ((prefix_len = 0) ? prefix_len : 
 0);
 +   strncpy(pre_arrow, arrow - prefix_len, 
 prefix_len);
 +   pre_arrow[prefix_len] = '¥0';
 
 
 This seems to be an encoding mistake; was this supposed to be an ASCII
 arrow?
 
 That's supposed to be a null terminator character, '\0'. See
 https://en.wikipedia.org/wiki/Yen_symbol#Code_page_932
Thank you for pointing out the encoding issue.
It looks, I need to change encoding of pbcopy command to
en_US.UTF-8 from C on Mac OS X.
I just sent updated patch as PATCH v3.

Thanks !

---
Tsuneo Yoshioka (吉岡 恒夫)
yoshiokatsu...@gmail.com




On Oct 12, 2013, at 8:35 AM, Keshav Kini keshav.k...@gmail.com wrote:

 Sam Vilain s...@vilain.net writes:
 
 On 10/11/2013 06:07 AM, Yoshioka Tsuneo wrote:
 +   prefix_len = ((prefix_len = 0) ? prefix_len : 
 0);
 +   strncpy(pre_arrow, arrow - prefix_len, 
 prefix_len);
 +   pre_arrow[prefix_len] = '¥0';
 
 
 This seems to be an encoding mistake; was this supposed to be an ASCII
 arrow?
 
 That's supposed to be a null terminator character, '\0'. See
 https://en.wikipedia.org/wiki/Yen_symbol#Code_page_932
 
 -Keshav
 
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-11 Thread Yoshioka Tsuneo
git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar, but if destination filename is long the line
is shortened like ...barbarbar so there is no way to know whether the
file is renamed or existed in the source commit.
This commit makes it visible like ...foo = ...bar.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diff.c | 54 +++---
 1 file changed, 47 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..9b55546 100644
--- a/diff.c
+++ b/diff.c
@@ -1643,13 +1643,53 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
len = name_width;
name_len = strlen(name);
if (name_width  name_len) {
-   char *slash;
-   prefix = ...;
-   len -= 3;
-   name += name_len - len;
-   slash = strchr(name, '/');
-   if (slash)
-   name = slash;
+   char *arrow = strstr(name,  = );
+   if(arrow){
+   int prefix_len = (name_width- 4) / 2;
+   int f_omit;
+   char *pre_arrow = alloca(name_width + 10);
+   char *post_arrow = arrow + 4;
+   char *prefix_buf = alloca(name_width + 10);
+   char *pre_arrow_slash = NULL;
+
+   if(arrow - name  prefix_len){
+   prefix_len = (int)(arrow - name);
+   f_omit = 0;
+   }else{
+   prefix_len -= 3;
+   f_omit = 1;
+   }
+   strncpy(pre_arrow, arrow - prefix_len, 
prefix_len);
+   pre_arrow[prefix_len] = '¥0';
+   pre_arrow_slash = strchr(pre_arrow, '/');
+   if(f_omit  pre_arrow_slash){
+   pre_arrow = pre_arrow_slash;
+   }
+   sprintf(prefix_buf, %s%s = , (f_omit ? ... 
: ), pre_arrow);
+   prefix = prefix_buf;
+
+   if(strlen(post_arrow)  name_width - 
strlen(prefix)){
+   char *post_arrow_slash = NULL;
+
+   post_arrow += strlen(post_arrow) - 
(name_width - strlen(prefix) - 3);
+   strcat(prefix_buf, ...);
+   post_arrow_slash = strchr(post_arrow, 
'/');
+   if(post_arrow_slash){
+   post_arrow = post_arrow_slash;
+   }
+   name = post_arrow;
+   name_len = (int) (name_width - 
strlen(prefix));
+   }
+   len -= strlen(prefix);
+   }else{
+   char *slash = NULL;
+   prefix = ...;
+   len -= 3;
+   name += name_len - len;
+   slash = strchr(name, '/');
+   if (slash)
+   name = slash;
+   }
}
 
if (file-is_binary) {
-- 
1.8.4.475.g867697c


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] diff.c: keep arrow(=) on show_stats()'s shortened filename part to make rename visible.

2013-10-11 Thread Yoshioka Tsuneo
git diff -M --stat can detect rename and show renamed file name like
foofoofoo = barbarbar, but if destination filename is long the line
is shortened like ...barbarbar so there is no way to know whether the
file is renamed or existed in the source commit.
This commit makes it visible like ...foo = ...bar.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diff.c | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index a04a34d..3aeaf3e 100644
--- a/diff.c
+++ b/diff.c
@@ -1643,13 +1643,57 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
len = name_width;
name_len = strlen(name);
if (name_width  name_len) {
-   char *slash;
-   prefix = ...;
-   len -= 3;
-   name += name_len - len;
-   slash = strchr(name, '/');
-   if (slash)
-   name = slash;
+   char *arrow = strstr(name,  = );
+   if (arrow) {
+   int prefix_len = (name_width - 4) / 2;
+   int f_omit;
+   int f_brace = 0;
+   char *pre_arrow = alloca(name_width + 10);
+   char *post_arrow = arrow + 4;
+   char *prefix_buf = alloca(name_width + 10);
+   char *pre_arrow_slash = NULL;
+
+   if (arrow - name  prefix_len) {
+   prefix_len = (int)(arrow - name);
+   f_omit = 0;
+   } else {
+   prefix_len -= 3;
+   f_omit = 1;
+   if (name[0] == '{') {
+   prefix_len -= 1;
+   f_brace = 1;
+   }
+   }
+   prefix_len = ((prefix_len = 0) ? prefix_len : 
0);
+   strncpy(pre_arrow, arrow - prefix_len, 
prefix_len);
+   pre_arrow[prefix_len] = '¥0';
+   pre_arrow_slash = strchr(pre_arrow, '/');
+   if (f_omit  pre_arrow_slash)
+   pre_arrow = pre_arrow_slash;
+   sprintf(prefix_buf, %s%s%s = , (f_brace ? 
{ : ), (f_omit ? ... : ), pre_arrow);
+   prefix = prefix_buf;
+
+   if (strlen(post_arrow)  name_width - 
strlen(prefix)) {
+   char *post_arrow_slash = NULL;
+
+   post_arrow += strlen(post_arrow) - 
(name_width - strlen(prefix) - 3);
+   strcat(prefix_buf, ...);
+   post_arrow_slash = strchr(post_arrow, 
'/');
+   if (post_arrow_slash)
+   post_arrow = post_arrow_slash;
+   name = post_arrow;
+   name_len = (int) (name_width - 
strlen(prefix));
+   }
+   len -= strlen(prefix);
+   } else {
+   char *slash = NULL;
+   prefix = ...;
+   len -= 3;
+   name += name_len - len;
+   slash = strchr(name, '/');
+   if (slash)
+   name = slash;
+   }
}
 
if (file-is_binary) {
-- 
1.8.4.475.g867697c



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] diffcore-delta: optimize renames and copies detection (git diff -M/-C)

2013-10-09 Thread Yoshioka Tsuneo
diffcore_count_changes() can return -1 when src_copied is greater than
delta_limit, without counting all the src_copied.
By that, performance of diff -M/-C can be improved.

Signed-off-by: Tsuneo Yoshioka yoshiokatsu...@gmail.com
---
 diffcore-delta.c  | 11 ---
 diffcore-rename.c |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/diffcore-delta.c b/diffcore-delta.c
index 7cf431d..0a9290e 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -173,7 +173,7 @@ int diffcore_count_changes(struct diff_filespec *src,
 {
struct spanhash *s, *d;
struct spanhash_top *src_count, *dst_count;
-   unsigned long sc, la;
+   unsigned long sc, not_sc, la;
 
src_count = dst_count = NULL;
if (src_count_p)
@@ -190,7 +190,7 @@ int diffcore_count_changes(struct diff_filespec *src,
if (dst_count_p)
*dst_count_p = dst_count;
}
-   sc = la = 0;
+   sc = not_sc = la = 0;
 
s = src_count-data;
d = dst_count-data;
@@ -214,8 +214,13 @@ int diffcore_count_changes(struct diff_filespec *src,
la += dst_cnt - src_cnt;
sc += src_cnt;
}
-   else
+   else{
sc += dst_cnt;
+   not_sc += (src_cnt - dst_cnt);
+   if(delta_limit != 0  not_sc  delta_limit){
+   return -1;
+   }
+   }
s++;
}
while (d-cnt) {
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..d52b2c8 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -181,7 +181,7 @@ static int estimate_similarity(struct diff_filespec *src,
return 0;
 
delta_limit = (unsigned long)
-   (base_size * (MAX_SCORE-minimum_score) / MAX_SCORE);
+   (max_size * (MAX_SCORE-minimum_score) / MAX_SCORE);
if (diffcore_count_changes(src, dst,
   src-cnt_data, dst-cnt_data,
   delta_limit,
-- 
1.7.12.4 (Apple Git-37)


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html