Re: [PATCH v5 5/9] patch-id: document new behaviour

2014-04-27 Thread Michael S. Tsirkin
On Thu, Apr 24, 2014 at 03:12:14PM -0700, Junio C Hamano wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
   +--unstable::
   +Use a non-symmetrical sum of hashes, such that reordering
  
  What is a non-symmetrical sum?
 
  Non-symmetrical combination function is better?
 
 I do not think either is very good X-.
 
 The primary points to convey for --stable are:
 
  - Two patches produced by comparing the same two trees with two
different settings for -Oorderfile will result in the same
patchc signature, thereby allowing the computed result to be used
as a key to index some metainformation about the change between
the two trees;
 
  - It will produce a result different from the plain vanilla
patch-id has always produced even when used on a diff output
taken without any use of -Oorderfile, thereby making existing
databases keyed by patch-ids unusable.
 
 The fact that we happened to use a patch-id that catches that
 somebody reordered the same patch into different file order and
 declares that they are two different changes is a more historical
 accident than a designed goal.
 
 I would even say that we would have used the stable version from
 the beginning if we thought that -Oorderfile would be widely
 used when these two features both appeared.  Even though I was the
 guilty one who introduced it, I'd admit that -Oorderfile has
 merely been a curiosity from its inception and has been a failed
 experiment, not in the sense that the feature does not work as
 adverertised (it does), but in the sense that it is not widely used
 (evidenced by the lack of complaints on missing diff.orderfile for a
 long time) at all.  With -Oorderfile being a failed experiment,
 the unstability did not matter, so it has stuck.
 
 The only two things worth mentioning about --unstable, if our
 future direction is to see diff.orderfile and --stable a lot more
 widely used, are:
 
  (1) it keeps producing the same patch-id as existing versions of
  Git, so users with existing databases (who do not deal with
  reordered patches) may want to use it; and perhaps
 
  (2) it will not consider a patch taken with -Oorderfile and
  another without it from the same source the same patches.
 
 Mathmatically speaking, mentioning non-symmetrial might be one way
 of expressing the latter point (2), but stressing on that alone
 without mentioning (1) misses the point.  (2) is _not_ a designed
 feature, so it is not very interesting.  Unless you have an existing
 database, there is no reason to use --unstable.
 
 On the other hand (1) is a very relevant thing to mention, as we are
 talking about a feature that, if unused, may break existing users'
 data.

OK I did just that, pls take a look.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/9] patch-id: document new behaviour

2014-04-24 Thread Michael S. Tsirkin
Clarify that patch ID can now be a sum of hashes, not a hash.
Document how command line and config options affect the
behaviour.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Documentation/git-patch-id.txt | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 312c3b1..e21b79b 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 
 [verse]
-'git patch-id'  patch
+'git patch-id' [--stable | --unstable]  patch
 
 DESCRIPTION
 ---
-A patch ID is nothing but a SHA-1 of the diff associated with a patch, with
-whitespace and line numbers ignored.  As such, it's reasonably stable, but at
-the same time also reasonably unique, i.e., two patches that have the same 
patch
-ID are almost guaranteed to be the same thing.
+A patch ID is nothing but a sum of SHA-1 of the diff hunks associated with a
+patch, with whitespace and line numbers ignored.  As such, it's reasonably
+stable, but at the same time also reasonably unique, i.e., two patches that
+have the same patch ID are almost guaranteed to be the same thing.
 
 IOW, you can use this thing to look for likely duplicate commits.
 
@@ -27,6 +27,19 @@ This can be used to make a mapping from patch ID to commit 
ID.
 
 OPTIONS
 ---
+
+--stable::
+   Use a symmetrical sum of hashes as the patch ID.
+   With this option, reordering file diffs that make up a patch or
+   splitting a diff up to multiple diffs that touch the same path
+   does not affect the ID.
+   This is the default if patchid.stable is set to true.
+
+--unstable::
+   Use a non-symmetrical sum of hashes, such that reordering
+   or splitting the patch does affect the ID.
+   This is the default.
+
 patch::
The diff to create the ID of.
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 5/9] patch-id: document new behaviour

2014-04-24 Thread Jonathan Nieder
Michael S. Tsirkin wrote:

  Documentation/git-patch-id.txt | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

Ah, there's the documentation.  Please squash this with the patch that
introduces the new behavior so they can be reviewed together more
easily (both now and later when people do archeology).

[...]
 +--stable::
 + Use a symmetrical sum of hashes as the patch ID.
 + With this option, reordering file diffs that make up a patch or
 + splitting a diff up to multiple diffs that touch the same path
 + does not affect the ID.
 + This is the default if patchid.stable is set to true.

This doesn't explain to me why I would want to use --stable versus
--unstable.  Maybe an EXAMPLES section would help?

The only reason I can think of to use --unstable is for compatibility
with historical patch-ids.  Is there any other reason?

At this point in the series there is no patchid.stable configuration.

 +--unstable::
 + Use a non-symmetrical sum of hashes, such that reordering

What is a non-symmetrical sum?

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 5/9] patch-id: document new behaviour

2014-04-24 Thread Michael S. Tsirkin
On Thu, Apr 24, 2014 at 10:33:25AM -0700, Jonathan Nieder wrote:
 Michael S. Tsirkin wrote:
 
   Documentation/git-patch-id.txt | 23 ++-
   1 file changed, 18 insertions(+), 5 deletions(-)
 
 Ah, there's the documentation.  Please squash this with the patch that
 introduces the new behavior so they can be reviewed together more
 easily (both now and later when people do archeology).
 
 [...]
  +--stable::
  +   Use a symmetrical sum of hashes as the patch ID.
  +   With this option, reordering file diffs that make up a patch or
  +   splitting a diff up to multiple diffs that touch the same path
  +   does not affect the ID.
  +   This is the default if patchid.stable is set to true.
 
 This doesn't explain to me why I would want to use --stable versus
 --unstable.  Maybe an EXAMPLES section would help?
 
 The only reason I can think of to use --unstable is for compatibility
 with historical patch-ids.  Is there any other reason?
 
 At this point in the series there is no patchid.stable configuration.
 
  +--unstable::
  +   Use a non-symmetrical sum of hashes, such that reordering
 
 What is a non-symmetrical sum?

Non-symmetrical combination function is better?

 Thanks,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 5/9] patch-id: document new behaviour

2014-04-24 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

  +--unstable::
  +  Use a non-symmetrical sum of hashes, such that reordering
 
 What is a non-symmetrical sum?

 Non-symmetrical combination function is better?

I do not think either is very good X-.

The primary points to convey for --stable are:

 - Two patches produced by comparing the same two trees with two
   different settings for -Oorderfile will result in the same
   patchc signature, thereby allowing the computed result to be used
   as a key to index some metainformation about the change between
   the two trees;

 - It will produce a result different from the plain vanilla
   patch-id has always produced even when used on a diff output
   taken without any use of -Oorderfile, thereby making existing
   databases keyed by patch-ids unusable.

The fact that we happened to use a patch-id that catches that
somebody reordered the same patch into different file order and
declares that they are two different changes is a more historical
accident than a designed goal.

I would even say that we would have used the stable version from
the beginning if we thought that -Oorderfile would be widely
used when these two features both appeared.  Even though I was the
guilty one who introduced it, I'd admit that -Oorderfile has
merely been a curiosity from its inception and has been a failed
experiment, not in the sense that the feature does not work as
adverertised (it does), but in the sense that it is not widely used
(evidenced by the lack of complaints on missing diff.orderfile for a
long time) at all.  With -Oorderfile being a failed experiment,
the unstability did not matter, so it has stuck.

The only two things worth mentioning about --unstable, if our
future direction is to see diff.orderfile and --stable a lot more
widely used, are:

 (1) it keeps producing the same patch-id as existing versions of
 Git, so users with existing databases (who do not deal with
 reordered patches) may want to use it; and perhaps

 (2) it will not consider a patch taken with -Oorderfile and
 another without it from the same source the same patches.

Mathmatically speaking, mentioning non-symmetrial might be one way
of expressing the latter point (2), but stressing on that alone
without mentioning (1) misses the point.  (2) is _not_ a designed
feature, so it is not very interesting.  Unless you have an existing
database, there is no reason to use --unstable.

On the other hand (1) is a very relevant thing to mention, as we are
talking about a feature that, if unused, may break existing users'
data.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html