Re: [WIP PATCH] Manual rename correction

2012-08-02 Thread Jeff King
On Thu, Aug 02, 2012 at 03:51:17PM -0700, Junio C Hamano wrote: > > On Wed, Aug 01, 2012 at 03:10:55PM -0700, Junio C Hamano wrote: > > ... > >> When you move porn/0001.jpg in the preimage to naughty/1.jpg in > >> the postimage, they both can hit "*.jpg contentid=jpeg" line in the > >> top-lev

Re: [WIP PATCH] Manual rename correction

2012-08-02 Thread Junio C Hamano
Jeff King writes: > On Wed, Aug 01, 2012 at 03:10:55PM -0700, Junio C Hamano wrote: > ... >> When you move porn/0001.jpg in the preimage to naughty/1.jpg in >> the postimage, they both can hit "*.jpg contentid=jpeg" line in the >> top-level .gitattribute file, and the contentid driver for jpe

Re: [WIP PATCH] Manual rename correction

2012-08-02 Thread Jeff King
On Thu, Aug 02, 2012 at 07:08:25PM +0700, Nguyen Thai Ngoc Duy wrote: > > I implemented (1a). Implementing (1b) would be easy, but for a full-on > > cache (especially for "-C"), I think the resulting size might be > > prohibitive. > > (1a) is good regardless rename overrides. Why don't you polish

Re: [WIP PATCH] Manual rename correction

2012-08-02 Thread Jeff King
On Wed, Aug 01, 2012 at 03:10:55PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > On Tue, Jul 31, 2012 at 11:01:27PM -0700, Junio C Hamano wrote: > > ... > >> As we still have the pathname in this codepath, I am wondering if we > >> would benefit from custom "content hash" that knows the

Re: [WIP PATCH] Manual rename correction

2012-08-02 Thread Nguyen Thai Ngoc Duy
On Thu, Aug 2, 2012 at 4:27 AM, Jeff King wrote: > Yes, if you go with a commit-based approach, you can do either notes or > in-commit messages. In other words, I would break the solutions down as: > > 1. Store sha1+sha1 -> score mapping (i.e., what I suggested). This is > fundamentally a g

Re: [WIP PATCH] Manual rename correction

2012-08-01 Thread Junio C Hamano
Jeff King writes: > On Tue, Jul 31, 2012 at 11:01:27PM -0700, Junio C Hamano wrote: > >> - As entries in rename cache that record high scores have names of >>"similar" blobs, pack-objects may be able to take advantage of >>this information. > > Yeah, although I suspect it is not as big a

Re: [WIP PATCH] Manual rename correction

2012-08-01 Thread Junio C Hamano
Jeff King writes: > On Tue, Jul 31, 2012 at 11:01:27PM -0700, Junio C Hamano wrote: > ... >> As we still have the pathname in this codepath, I am wondering if we >> would benefit from custom "content hash" that knows the nature of >> payload than the built-in similarity estimator, driven by the >

Re: [WIP PATCH] Manual rename correction

2012-08-01 Thread Jeff King
On Tue, Jul 31, 2012 at 11:01:27PM -0700, Junio C Hamano wrote: > > @@ -175,6 +177,11 @@ static int estimate_similarity(struct diff_filespec > > *src, > > if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) > > return 0; > > > > + hashcpy(pair.one, src->sha1); >

Re: [WIP PATCH] Manual rename correction

2012-08-01 Thread Jeff King
On Wed, Aug 01, 2012 at 01:34:23PM +0700, Nguyen Thai Ngoc Duy wrote: > On Wed, Aug 1, 2012 at 1:09 PM, Junio C Hamano wrote: > > Nguyen Thai Ngoc Duy writes: > > > >> Yes. This is probably cosmetics only, but without path information, we > >> leave it to chance to decide which A to pair with B

Re: [WIP PATCH] Manual rename correction

2012-08-01 Thread Jeff King
On Wed, Aug 01, 2012 at 11:36:00AM +0700, Nguyen Thai Ngoc Duy wrote: > > That is orthogonal to the issue of what is being stored. I chose my > > mmap'd disk implementation because it is very fast, which makes it nice > > for a performance cache. But you could store the same thing in git-notes > >

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 1, 2012 at 1:09 PM, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy writes: > >> Yes. This is probably cosmetics only, but without path information, we >> leave it to chance to decide which A to pair with B and C (in the >> A->B, A->C case above). Wrong path might lead to funny effects (

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Junio C Hamano
Nguyen Thai Ngoc Duy writes: > Yes. This is probably cosmetics only, but without path information, we > leave it to chance to decide which A to pair with B and C (in the > A->B, A->C case above). Wrong path might lead to funny effects (i'm > thinking of git log --follow). Isn't that why and ca

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Junio C Hamano
Jeff King writes: > @@ -175,6 +177,11 @@ static int estimate_similarity(struct diff_filespec *src, > if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) > return 0; > > + hashcpy(pair.one, src->sha1); > + hashcpy(pair.two, dst->sha1); > + if (renam

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 1, 2012 at 9:01 AM, Jeff King wrote: > On Wed, Aug 01, 2012 at 08:10:12AM +0700, Nguyen Thai Ngoc Duy wrote: > >> > I do not think that is the right direction. Let's imagine that I have a >> > commit "A" and I annotate it (via notes or whatever) to say "between >> > A^^{tree} and A^{tr

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Jeff King
On Wed, Aug 01, 2012 at 08:10:12AM +0700, Nguyen Thai Ngoc Duy wrote: > > I do not think that is the right direction. Let's imagine that I have a > > commit "A" and I annotate it (via notes or whatever) to say "between > > A^^{tree} and A^{tree}, foo.c became bar.c". That will help me when > > doi

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 1, 2012 at 2:23 AM, Jeff King wrote: >> It is a good direction to go in, I would think, to give users a way >> to explicitly tell that "in comparison between these two trees, I >> know path B in the postimage corresponds to path A in the preimage". > > I do not think that is the right

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Jeff King
On Tue, Jul 31, 2012 at 01:20:42PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > A much better hint is to annotate pairs of sha1s, to say "do not bother > > doing inexact rename correlation on this pair; I promise that they have > > value N". > > Surely. And I suspect that the patch t

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Junio C Hamano
Jeff King writes: > A much better hint is to annotate pairs of sha1s, to say "do not bother > doing inexact rename correlation on this pair; I promise that they have > value N". Surely. And I suspect that the patch to the current codebase to do so would be much less impact if you go that way.

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Jeff King
On Tue, Jul 31, 2012 at 09:32:49AM -0700, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy writes: > > > The above output is done with "git diff --manual-rename=foo A B" > > and "foo" contains (probably not in the best format though) > > > > -- 8< -- > > attr.c dir.c > > dir.c attr.c > > -- 8< -- >

Re: [WIP PATCH] Manual rename correction

2012-07-31 Thread Junio C Hamano
Nguyen Thai Ngoc Duy writes: > The above output is done with "git diff --manual-rename=foo A B" > and "foo" contains (probably not in the best format though) > > -- 8< -- > attr.c dir.c > dir.c attr.c > -- 8< -- > ... > Comments? It is a good direction to go in, I would think, to give users a wa