Re: GNU diff and git diff - difference on myers algorithm?

2015-07-16 Thread Luis R. Rodriguez
On Fri, Jun 12, 2015 at 11:52 AM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 OK wells I'm curious about more research / effort when trying to
 evaluate a diff with two seprate but adjoining preprocessor directives
 and if anyone has implemented an optimizaiton option to let the diff
 generator join them.

 For example, to let it infer that:

 --- a/test.c
 +++ b/test.c
 @@ -10,8 +10,6 @@ int main(int argc, char *argv[])

  #ifdef FOO
 a = 4;
 -#endif /* FOO */
 -#ifdef FOO
 a = 5;
  #endif /* FOO */

 is possible.

Anyone familiar if any tool exists today that would optimize this? Is
anyone working on it? Would git be a good place for such a thing? I'd
consider it as an option to optimize a diff. This for example is
extremely useful for us working with Coccinelle where we have a tool
writing code for us, while such an optimization might be useful to
Coccinelle it would seem like a rather generic feature, its just not
clear to me where to give such a tool a proper home.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-07-16 Thread Jacob Keller
On Thu, Jul 16, 2015 at 9:22 PM, Jacob Keller jacob.kel...@gmail.com wrote:
 On Thu, Jul 16, 2015 at 12:07 PM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
 On Fri, Jun 12, 2015 at 11:52 AM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
 OK wells I'm curious about more research / effort when trying to
 evaluate a diff with two seprate but adjoining preprocessor directives
 and if anyone has implemented an optimizaiton option to let the diff
 generator join them.

 For example, to let it infer that:

 --- a/test.c
 +++ b/test.c
 @@ -10,8 +10,6 @@ int main(int argc, char *argv[])

  #ifdef FOO
 a = 4;
 -#endif /* FOO */
 -#ifdef FOO
 a = 5;
  #endif /* FOO */

 is possible.

 Anyone familiar if any tool exists today that would optimize this? Is
 anyone working on it? Would git be a good place for such a thing? I'd
 consider it as an option to optimize a diff. This for example is
 extremely useful for us working with Coccinelle where we have a tool
 writing code for us, while such an optimization might be useful to
 Coccinelle it would seem like a rather generic feature, its just not
 clear to me where to give such a tool a proper home.

  Luis

 I do not understand exactly what would be optimized in this case?

 In any regards, that's not a diff transformation, that is a code
 transformation, and I would suggest starting with Coccinelle and
 seeing if you can get that to do what you want.
 http://coccinelle.lip6.fr/

 Regards,
 Jake

I misread your comment. Coccinelle is a tool that could probably be
coerced into doing this, but this is not a diff optimization unless I
am completely understanding it. This is a code transformation concept,
and doesn't have much to do with finding differences or code changes.
My above comment is correct, but I don't think this belongs in diff
parsing, but rather as part of something like Coccinelle

Regards,
Jake
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-07-16 Thread Jacob Keller
On Thu, Jul 16, 2015 at 12:07 PM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 On Fri, Jun 12, 2015 at 11:52 AM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
 OK wells I'm curious about more research / effort when trying to
 evaluate a diff with two seprate but adjoining preprocessor directives
 and if anyone has implemented an optimizaiton option to let the diff
 generator join them.

 For example, to let it infer that:

 --- a/test.c
 +++ b/test.c
 @@ -10,8 +10,6 @@ int main(int argc, char *argv[])

  #ifdef FOO
 a = 4;
 -#endif /* FOO */
 -#ifdef FOO
 a = 5;
  #endif /* FOO */

 is possible.

 Anyone familiar if any tool exists today that would optimize this? Is
 anyone working on it? Would git be a good place for such a thing? I'd
 consider it as an option to optimize a diff. This for example is
 extremely useful for us working with Coccinelle where we have a tool
 writing code for us, while such an optimization might be useful to
 Coccinelle it would seem like a rather generic feature, its just not
 clear to me where to give such a tool a proper home.

  Luis

I do not understand exactly what would be optimized in this case?

In any regards, that's not a diff transformation, that is a code
transformation, and I would suggest starting with Coccinelle and
seeing if you can get that to do what you want.
http://coccinelle.lip6.fr/

Regards,
Jake
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-06-12 Thread Luis R. Rodriguez
On Tue, Jun 9, 2015 at 1:25 AM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Luis,

 On 2015-06-08 20:34, Luis R. Rodriguez wrote:
 Based on a cursory review of the git code I get the impression that
 GNU diff and git 'diff' do not share any code for the possible diff
 algorithms.

 Indeed, Git's diff machinery is based[*1*] ofn libxdiff[*2*], not on GNU diff.

Great thanks for the confirmation. Interesting point by Linus on that
commit that changed from forking to use GNU diff to libxdiff:

  generating a diff is not an exact
   science - you can get two different diffs (and you will), and they can
   both be perfectly valid. So it's not possible to validate the
   libxdiff output by just comparing it against GNU diff.

Indeed, simple example is using different starting context lines, or
different number of context lines. I do however wonder if under
certain parameters they *need* to be equal, or if this has been
studied exactly before.

 I'm in particularly curious more about the default myers
 algorithm.

 Are you looking for a freely available implementation of the Myers algorithm? 
 Or are you interested in understanding it?

I was trying to determine the above, of possibilities of differences.
Now granted there are the diffs using different output layouts
should differ, and as I note above if you modify the conext preference
you might end up with slightly different diffs, but I am also curious
to know if anyone has done research to see whether or not two hunks
which are slightly different are functionally equivalent. More on the
reasoning behind this below.

 Please note that Myers' algorithm is just one first step in most diff 
 implementations (and that other diff algorithms have become popular, in 
 particular because comparing strings can be accelerated by hashing the text 
 lines first, and those hashes can also be used to identify matching pairs of 
 unique lines, giving rise to yet another huge performance boost for typical 
 uses).

Awesome.

 The reason why Myers' algorithm is not sufficient for diff implementations is 
 that it only optimizes the edit distance, i.e. the amount of added/removed 
 lines, while patches should be readable, too, i.e. prefer *consecutive* edits 
 to disjunct ones.

Indeed, this is along the lines of what I am looking for but with some
other tweaks considered, more on this below.

 Just to mention one post-processing technique that is so useful that I 
 implemented it for Git[*3*]: the patience diff algorithm of Bram Cohen (of 
 BitTorrent fame) finds matching pairs of unique lines -- think of a function 
 from which another function is refactored, for example, intuitively you want 
 the diff to keep the signature of the original function as a context line.

Indeed.

 Disclaimer: While it is true that Gene and I shared an office for one month, 
 and that I am once again working in the same institute as he does, all my 
 knowledge about this algorithm stems from my reading his paper and 
 implementing the algorithm in Java for use in JGit[*3*].

:)

 I can take time to do a precise code review of the
 algorithms used on both GNU diff and git but if someone can already
 vet for any differences that'd be appreciated as it would save time.

 Again, I am curious what your goal is? I am sure I can support your quest 
 better when I understand what the purpose of this code review should be.

OK wells I'm curious about more research / effort when trying to
evaluate a diff with two seprate but adjoining preprocessor directives
and if anyone has implemented an optimizaiton option to let the diff
generator join them.

For example, to let it infer that:

--- a/test.c
+++ b/test.c
@@ -10,8 +10,6 @@ int main(int argc, char *argv[])

 #ifdef FOO
a = 4;
-#endif /* FOO */
-#ifdef FOO
a = 5;
 #endif /* FOO */

is possible.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GNU diff and git diff - difference on myers algorithm?

2015-06-09 Thread Johannes Schindelin
Hi Luis,

On 2015-06-08 20:34, Luis R. Rodriguez wrote:
 Based on a cursory review of the git code I get the impression that
 GNU diff and git 'diff' do not share any code for the possible diff
 algorithms.

Indeed, Git's diff machinery is based[*1*] ofn libxdiff[*2*], not on GNU diff.

 I'm in particularly curious more about the default myers
 algorithm.

Are you looking for a freely available implementation of the Myers algorithm? 
Or are you interested in understanding it?

Please note that Myers' algorithm is just one first step in most diff 
implementations (and that other diff algorithms have become popular, in 
particular because comparing strings can be accelerated by hashing the text 
lines first, and those hashes can also be used to identify matching pairs of 
unique lines, giving rise to yet another huge performance boost for typical 
uses).

The reason why Myers' algorithm is not sufficient for diff implementations is 
that it only optimizes the edit distance, i.e. the amount of added/removed 
lines, while patches should be readable, too, i.e. prefer *consecutive* edits 
to disjunct ones.

Just to mention one post-processing technique that is so useful that I 
implemented it for Git[*3*]: the patience diff algorithm of Bram Cohen (of 
BitTorrent fame) finds matching pairs of unique lines -- think of a function 
from which another function is refactored, for example, intuitively you want 
the diff to keep the signature of the original function as a context line.

Disclaimer: While it is true that Gene and I shared an office for one month, 
and that I am once again working in the same institute as he does, all my 
knowledge about this algorithm stems from my reading his paper and implementing 
the algorithm in Java for use in JGit[*3*].

 I can take time to do a precise code review of the
 algorithms used on both GNU diff and git but if someone can already
 vet for any differences that'd be appreciated as it would save time.

Again, I am curious what your goal is? I am sure I can support your quest 
better when I understand what the purpose of this code review should be.

Ciao,
Johannes

Footnote *1*: 
https://github.com/git/git/commit/3443546f6ef57fe28ea5cca232df8e400bfc3883
Footnote *2*: http://www.xmailserver.org/xdiff-lib.html
Footnote *3*: https://github.com/git/git/blob/master/xdiff/xpatience.c
Footnote *4*: 
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/diff/MyersDiff.java
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GNU diff and git diff - difference on myers algorithm?

2015-06-08 Thread Luis R. Rodriguez
Based on a cursory review of the git code I get the impression that
GNU diff and git 'diff' do not share any code for the possible diff
algorithms. I'm in particularly curious more about the default myers
algorithm. I can take time to do a precise code review of the
algorithms used on both GNU diff and git but if someone can already
vet for any differences that'd be appreciated as it would save time.

 Luis
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html