Re: move detection doesnt take filename into account

2014-07-09 Thread Jeff King
On Wed, Jul 09, 2014 at 03:18:43PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > I think the hash here does not collide in that way. It really is just > > the last sixteen characters shoved into a uint32_t. > > All bytes overlap with their adjacent byte because they are shifted > by on

Re: move detection doesnt take filename into account

2014-07-09 Thread Junio C Hamano
Jeff King writes: > I think the hash here does not collide in that way. It really is just > the last sixteen characters shoved into a uint32_t. All bytes overlap with their adjacent byte because they are shifted by only 2 bits, not 8 bits, when a new byte is brought in. We can say that the topm

Re: move detection doesnt take filename into account

2014-07-09 Thread Jeff King
On Wed, Jul 09, 2014 at 08:51:07AM -0700, Junio C Hamano wrote: > > The delta heuristics in pack-objects use pack_name_hash, which claims: > > > > /* > > * This effectively just creates a sortable number from the > > * last sixteen non-whitespace characters. Last characte

Re: move detection doesnt take filename into account

2014-07-09 Thread Junio C Hamano
Jeff King writes: > On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote: > >> I didn't think it through but my gut feeling is that we could change >> the name similarity score to be the length of the tail part that >> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail >

Re: move detection doesnt take filename into account

2014-07-08 Thread Jeff King
On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote: > I didn't think it through but my gut feeling is that we could change > the name similarity score to be the length of the tail part that > matches (e.g. 1.a to a/2.a that has the same two bytes at the tail > is a better match than to

Re: move detection doesnt take filename into account

2014-07-01 Thread Junio C Hamano
Elliot Wolk writes: > On 07/01/2014 10:57 AM, Junio C Hamano wrote: >> Robin Rosenberg writes: >> >>> I think it does, but based on filename suffix. E.g. here is a rename of >>> three empty files with a suffix. >>> >>> 3 files changed, 0 insertions(+), 0 deletions(-) >>> rename 1.a => 2.a (1

Re: move detection doesnt take filename into account

2014-07-01 Thread Elliot Wolk
thanks for the info! then i suppose my bug is a petition to have name similarity instead use a different statistical matching algorithm. On 07/01/2014 10:57 AM, Junio C Hamano wrote: Robin Rosenberg writes: I think it does, but based on filename suffix. E.g. here is a rename of three empty

Re: move detection doesnt take filename into account

2014-07-01 Thread Junio C Hamano
Robin Rosenberg writes: > I think it does, but based on filename suffix. E.g. here is a rename of > three empty files with a suffix. > > 3 files changed, 0 insertions(+), 0 deletions(-) > rename 1.a => 2.a (100%) > rename 1.b => 2.b (100%) > rename 1.c => 2.c (100%) This is not more than a c

Re: move detection doesnt take filename into account

2014-07-01 Thread Elliot Wolk
interesting that it considers suffixes {only suffixes following periods?}. this is insufficient, in my opinion. with all other things being equal, it ought to find the closest match {using smith-waterman or some such algorithm}. as a real-world use case, i have a repository with empty files t

Re: move detection doesnt take filename into account

2014-07-01 Thread Robin Rosenberg
- Ursprungligt meddelande - > Från: "Elliot Wolk" > Till: git@vger.kernel.org > Skickat: måndag, 30 jun 2014 8:38:18 > Ämne: move detection doesnt take filename into account > > if you move two identical {e.g.: empty} files to two new locations in a > single commit, the move detection p