Re: Rename detection within in files WAS: [PATCH 2/6] t7408: merge short tests, factor out testing method

2016-08-07 Thread René Scharfe

Am 06.08.2016 um 01:26 schrieb Stefan Beller:

When moving code around, we usually get large chunks of text. If the contributor
is not 100% trustworthy, we need to review all the code without much intelectual
joy. Essentially the reviewer is just making sure the parts of the text are the
same.

I'd like to propose a new addition to the diff format that makes this use case
easier. The idea is to mark up lines that were just moved around in the file
instead of adding and removing them.

Currently we have 3 characters that
are allowed to start a line within a hunk:
' ' to indicate context
'+' to add a line
'-' to remove a line

I'd propose to add the following characters:
'*' which is the same as '+', but it indicates that the line was moved
 from somewhere else without change.
'X' The same as '-', with the addition that this line was moved to a different
 place without change.

The patch below uses these new '*' and 'X'. Each hunk that makes use of these
additions, is followed other sections, [moved-from, moved-to] that indicate
where the corresponding line is.


Interesting idea. It should be easy to convert the result into a regular 
unified diff for consumption with patch(1) or git am/apply by replacing 
the new flags with + and - and removing the moved-* hunks.


Your example ignores whitespace changes at the start of the line and 
within it, the added "-C $working_dir", s/expected/expect/; is this all 
intended?  Only a single blank line was moved verbatim.


The moved-from and moved-to hunks make this diff quite verbose.

If multiple lines from different sources are moved to the same hunk then 
you'd get multiple moved-from hunks following that single destination, 
right?  (Same with lines moved from a single hunk to multiple 
destinations and moved-to.)


But does it even warrent a new format? It's a display problem; the 
necessary information is already in the diffs we have today.  A 
graphical diff viewer could connect moved blocks with lines, like 
http://www.araxis.com/merge/ does in its side-by-side view.  A 
Thunderbird extension (or a bookmarklet or browser extendiion for 
webmail users) could do that for an email-based workflow.


Still, what about adding information about moved lines as an extended 
header (like that index line)?  Line numbers are included in hunk 
headers and can serve as orientation.  A reader would have to do some 
mental arithmetic (ugh), but incompatible format changes would be 
avoided.  For your example it should look something like this:


move from t/t7408-submodule-reference.sh:52,1
move to t/t7408-submodule-reference.sh:22,1



There are multiple things to tackle when going for such an addition:
* How to present this to the user (it's covered in this email)
* how to find the renamed lines algorithmically.
   (there are already approaches to that, e.g. 
https://github.com/stefanbeller/duplo
   which is http://duplo.sourceforge.net/ with no substantial additions)

Any comments welcome,

Thanks,
Stefan

---
  t/t7408-submodule-reference.sh | 50 +++---
  1 file changed, 15 insertions(+), 29 deletions(-), 6 moved lines

diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh
index afcc629..1416cbd 100755
--- a/t/t7408-submodule-reference.sh
+++ b/t/t7408-submodule-reference.sh
@@ -10,6 +10,16 @@ base_dir=$(pwd)

  U=$base_dir/UPLOAD_LOG

+test_alternate_usage()
+{
+   alternates_file=$1
+   working_dir=$2
+   test_line_count = 1 $alternates_file &&
*   echo "0 objects, 0 kilobytes" >expect &&
*   git -C $working_dir count-objects >current &&
*   diff expect current
+}
+


Post-image line 22.


  test_expect_success 'preparing first repository' '
test_create_repo A &&
(
@@ move-source 42,6 @@ test_expect_success 'that reference gets used with add' '
  test_expect_success 'that reference gets used with add' '
(
cd super/sub &&
X   echo "0 objects, 0 kilobytes" > expected &&
X   git count-objects > current &&
X   diff expected current
)
  '
@@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' '
)
  '

-test_expect_success 'submodule add --reference' '
+test_expect_success 'submodule add --reference uses alternates' '
(
cd super &&
git submodule add --reference ../B "file://$base_dir/A" sub &&
git commit -m B-super-added
-   )
-'
-


Pre-image line 52.


-test_expect_success 'after add: existence of info/alternates' '
-   test_line_count = 1 super/.git/modules/sub/objects/info/alternates
-'
-
-test_expect_success 'that reference gets used with add' '
-   (
-   cd super/sub &&
X   echo "0 objects, 0 kilobytes" > expected &&
X   git count-objects > current &&
X   diff expected current
-   )
-'
-
-test_expect_success 'cloning superproject' '
-   

Rename detection within in files WAS: [PATCH 2/6] t7408: merge short tests, factor out testing method

2016-08-06 Thread Stefan Beller
When moving code around, we usually get large chunks of text. If the contributor
is not 100% trustworthy, we need to review all the code without much intelectual
joy. Essentially the reviewer is just making sure the parts of the text are the
same.

I'd like to propose a new addition to the diff format that makes this use case
easier. The idea is to mark up lines that were just moved around in the file
instead of adding and removing them.

Currently we have 3 characters that
are allowed to start a line within a hunk:
' ' to indicate context
'+' to add a line
'-' to remove a line

I'd propose to add the following characters:
'*' which is the same as '+', but it indicates that the line was moved
from somewhere else without change.
'X' The same as '-', with the addition that this line was moved to a different
place without change.

The patch below uses these new '*' and 'X'. Each hunk that makes use of these
additions, is followed other sections, [moved-from, moved-to] that indicate
where the corresponding line is.

There are multiple things to tackle when going for such an addition:
* How to present this to the user (it's covered in this email)
* how to find the renamed lines algorithmically.
  (there are already approaches to that, e.g. 
https://github.com/stefanbeller/duplo
  which is http://duplo.sourceforge.net/ with no substantial additions)

Any comments welcome,

Thanks,
Stefan

---
 t/t7408-submodule-reference.sh | 50 +++---
 1 file changed, 15 insertions(+), 29 deletions(-), 6 moved lines

diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh
index afcc629..1416cbd 100755
--- a/t/t7408-submodule-reference.sh
+++ b/t/t7408-submodule-reference.sh
@@ -10,6 +10,16 @@ base_dir=$(pwd)

 U=$base_dir/UPLOAD_LOG

+test_alternate_usage()
+{
+   alternates_file=$1
+   working_dir=$2
+   test_line_count = 1 $alternates_file &&
*   echo "0 objects, 0 kilobytes" >expect &&
*   git -C $working_dir count-objects >current &&
*   diff expect current
+}
+
 test_expect_success 'preparing first repository' '
test_create_repo A &&
(
@@ move-source 42,6 @@ test_expect_success 'that reference gets used with add' '
 test_expect_success 'that reference gets used with add' '
(
cd super/sub &&
X   echo "0 objects, 0 kilobytes" > expected &&
X   git count-objects > current &&
X   diff expected current
)
 '
@@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' '
)
 '

-test_expect_success 'submodule add --reference' '
+test_expect_success 'submodule add --reference uses alternates' '
(
cd super &&
git submodule add --reference ../B "file://$base_dir/A" sub &&
git commit -m B-super-added
-   )
-'
-
-test_expect_success 'after add: existence of info/alternates' '
-   test_line_count = 1 super/.git/modules/sub/objects/info/alternates
-'
-
-test_expect_success 'that reference gets used with add' '
-   (
-   cd super/sub &&
X   echo "0 objects, 0 kilobytes" > expected &&
X   git count-objects > current &&
X   diff expected current
-   )
-'
-
-test_expect_success 'cloning superproject' '
-   git clone super super-clone
-'
-
@@ move-to 10,6 @@ test_alternate_usage
+   alternates_file=$1
+   working_dir=$2
+   test_line_count = 1 $alternates_file &&
*   echo "0 objects, 0 kilobytes" >expect &&
*   git -C $working_dir count-objects >current &&
*   diff expect current
+}
+
--
2.9.2.572.g9d9644e.dirty
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html