Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-24 Thread Jeff King
On Mon, Jul 23, 2018 at 06:50:46PM -0700, Junio C Hamano wrote:

>   Side Note: there are a few workflow elements I do want to
>   keep using but they currently *lose* the mapping info.  An
>   obvious one is
> 
> $ git checkout -b to/pic master &&
> ... review in MUA and then ...
> $ git am -s mbox &&
> ... review in tree, attempt to build, tweak, etc.
>   $ git format-patch --stdout master..to/pic >P &&
>   $ edit P &&
>   $ git reset --hard master &&
>   $ git am P
> 
>   which is far more versatile and efficient when doing certain
>   transformations on the series than running "rebase -i" and
>   reopening and editing the target files of the patches one by
>   one in each step.  But because format-patch does not
>   generate Message-Id header of the original one out of the
>   commit, the post-applypatch hook run by "am" at the end of
>   the steps would not have a chance to record that for the
>   newly created commit.
> 
>   For this one, I think I can use "format-patch --notes=amlog"
>   to produce the patch file and then teach post-applypatch
>   script to pay attention to the Notes annotation without
>   changing anything else to record the message id of the
>   original.

Yes. I wonder if it would make sense to teach format-patch/am a
micro-format to automatically handle this case. I.e., some
machine-readable way of passing the notes in the email message.
Of course it's easy to design a format that covers the relatively
restricted form of these amlog notes, and much harder to cover the
general case.

>   Other workflow elements that lose the notes need
>   to be identified and either a fix implemented or a
>   workaround found for each of them.  For example, I suspect
>   there is no workaround for "cherry-pick" and it would take a
>   real fix.

I think the existing notes.rewriteRef is probably a good match here. I
can definitely think of notes you wouldn't want to cherry-pick, but I'm
having trouble coming up with an example that should survive a rebase
but not a cherry-pick.

> And these (incomplete) reverse mapping entries get in the way to
> maintain and correct the forward mapping.  When a commit that got
> unreachable gets expired, I want "git notes prune" to remove notes
> on them, and I do not want to even think about what should happen to
> the entries in the notes tree that abuse the mechanism to map blobs
> that are otherwise *not* even reachable from the main history.

Right, I think the notes tree is a poor distribution method for that
reason.

> > Though personally, I do not know if there is much point in pushing it
> > out, given that receivers can reverse the mapping themselves.
> 
> Before this thread, I was planning to construct and publish the
> reverse mapping at the end of the day, but do so on a separate notes
> ref (see above---the hacky abuse gets in the way of maintaining and
> debugging the forward mapping, but a separate notes-ref that only
> contains hacks is less worrysome).  But I have changed my mind and
> decided not to generate or publish one.  It is sort of similar to
> the way the pack .idx is constructed only by the receiver [*1*].

Yes, the pack .idx was the same mental model I had when writing my
earlier message.

> > Or is there some argument that there is information in the reverse map
> > that _cannot_ be generated from the forward map?
> 
> I know there is no information loss (after all I was the only one
> who ran that experimental hack), but there is one objection that is
> still possible, even though I admit that is a weak argument.

I wondered if you might have a case like this (building as we go):

 - message-id M becomes commit X
   - we write the forward map X->M
   - we write the reverse map M->X
 - during a rewrite (e.g., --amend), commit X becomes commit Y
   - we write the forward map Y->M
   - we write the reverse map M->Y

The difference between that result and an inverted map created at the
end is that we know that M->Y is the final result. Whereas by looking at
the inverted map, we do not know which of M->X and M->Y is correct. In
fact they are _both_ correct. But only one of X and Y would eventually
get merged (both may make it into the repo's of people fetching from you
if we imagine that X is on "pu" and you push between the two steps).

So I think the inverted mapping is not actually one-to-one, and in
either case you'd want to retain all possible matches (pruning only when
a commit is eventually dropped from the forward mapping, which rewritten
things from "pu" would eventually do). And in that case it does not
matter if you generate it incrementally or all at once.

> If a plumbing "diff-{files,tree,index}" family had a sibling
> "diff-notes" to compare two notes-shaped trees while pretending that
> the object-name fan-out did not exist (i.e. instead, the trees being
> compared is 

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-23 Thread Junio C Hamano
Jeff King  writes:

> If I understand the situation correctly, Junio is saying that he will
> continue to produce the amlog mapping, and that it contains sufficient
> information to produce the reverse mapping (which, as an aside, I did
> not even know existed -- I mostly want to go the other way, from digging
> in history to a mailing list conversation).

Yes, the reverse mapping in amlog was an experiment that did not
work well in the end.

When I use "git am" to make a commit out of a message, a
post-applypatch hook picks up the "Message-Id:" from the original
message and adds a git note to the resulting commit.  This is in
line with how the notes are meant to be used.  We have a commit
object, and a piece of information that we want to associate with
the commit object, which is not recorded as a part of the commit
object.  So we say "git notes add -m 'that piece of info' $commit"
(the message-id happens to be that piece of info in this example).

And with notes.rewriteRef, "git commit --amend" etc. would copy the
piece of info about the original commit to the rewritten commit.

Side Note: there are a few workflow elements I do want to
keep using but they currently *lose* the mapping info.  An
obvious one is

  $ git checkout -b to/pic master &&
  ... review in MUA and then ...
  $ git am -s mbox &&
  ... review in tree, attempt to build, tweak, etc.
  $ git format-patch --stdout master..to/pic >P &&
  $ edit P &&
  $ git reset --hard master &&
  $ git am P

which is far more versatile and efficient when doing certain
transformations on the series than running "rebase -i" and
reopening and editing the target files of the patches one by
one in each step.  But because format-patch does not
generate Message-Id header of the original one out of the
commit, the post-applypatch hook run by "am" at the end of
the steps would not have a chance to record that for the
newly created commit.

For this one, I think I can use "format-patch --notes=amlog"
to produce the patch file and then teach post-applypatch
script to pay attention to the Notes annotation without
changing anything else to record the message id of the
original.  Other workflow elements that lose the notes need
to be identified and either a fix implemented or a
workaround found for each of them.  For example, I suspect
there is no workaround for "cherry-pick" and it would take a
real fix.

A reverse mapping entry used to get created by post-applypatch to
map the blob that represents the notes text added to the $commit to
another text blob that contains the 40-hex of the commit object.
This is the experiment that did not work well.  As none of the later
integrator's work e.g. "commit --amend", "rebase", "cherry-pick",
etc. is about rewriting that blob, notes.rewriteRef mechanism would
not kick in, and that is understandasble.

And these (incomplete) reverse mapping entries get in the way to
maintain and correct the forward mapping.  When a commit that got
unreachable gets expired, I want "git notes prune" to remove notes
on them, and I do not want to even think about what should happen to
the entries in the notes tree that abuse the mechanism to map blobs
that are otherwise *not* even reachable from the main history.

A much more important task is to make sure that the forward mapping
that annotates invidual commits reachable from 'pu' and/or 'master' 
is maintained correctly by various tools.  From a correctly maintained
forward mapping, it should be straight forward to get a reverse mapping
if needed.

> Though personally, I do not know if there is much point in pushing it
> out, given that receivers can reverse the mapping themselves.

Before this thread, I was planning to construct and publish the
reverse mapping at the end of the day, but do so on a separate notes
ref (see above---the hacky abuse gets in the way of maintaining and
debugging the forward mapping, but a separate notes-ref that only
contains hacks is less worrysome).  But I have changed my mind and
decided not to generate or publish one.  It is sort of similar to
the way the pack .idx is constructed only by the receiver [*1*].

> Or is there some argument that there is information in the reverse map
> that _cannot_ be generated from the forward map?

I know there is no information loss (after all I was the only one
who ran that experimental hack), but there is one objection that is
still possible, even though I admit that is a weak argument.

If a plumbing "diff-{files,tree,index}" family had a sibling
"diff-notes" to compare two notes-shaped trees while pretending that
the object-name fan-out did not exist (i.e. instead, the trees being
compared is without a subtree and full of 40-hex filenames), then it
would be less cumbersome to incrementally update the 

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-22 Thread Jeff King
On Sat, Jul 21, 2018 at 11:56:06PM +0200, Johannes Schindelin wrote:

> > The script that I feed a message from gmane or public-inbox when I need
> > to learn the set of commits that resulted from the message instead uses
> > "git grep $message-id notes/amlog".  And that is fast enough for my
> > purpose.
> 
> Awesome. You might want to make sure that Peff stops advertising the amlog
> notes, then, though.

Woah, what did I do now?

> > There is no good reason to abuse the notes mechanism to map a random
> > object-name looking string (i.e. hash result of message id), other
> > than the ease of "quick access" when somebody is making a lot of
> > inquiry, but that "database" does not have to be stored in notes.
> 
> Right. And it does not have to be stored anywhere, because nobody used it
> anyway, right?

If I understand the situation correctly, Junio is saying that he will
continue to produce the amlog mapping, and that it contains sufficient
information to produce the reverse mapping (which, as an aside, I did
not even know existed -- I mostly want to go the other way, from digging
in history to a mailing list conversation).

E.g., the script below builds and queries an incremental reverse
mapping.

-- >8 --
#!/usr/bin/perl

my $REF = 'refs/notes/amlog';
my $DBFILE = '.git/amlog.rev';

use DB_File;
my %h;
my $db = tie %h, 'DB_File', $DBFILE, O_CREAT|O_RDWR, 0644
  or die "unable to open/create $DBFILE: $!";

my $db_tip = $h{TIP};
chomp(my $rev_tip = `git rev-parse $REF`);
if (!defined $db_tip || $db_tip ne $rev_tip) {
  print STDERR "Updating reverse mapping...\n";
  # using -p here is quick and easy, since we know the
  # shape of the data. Using --raw and cat-file might be less
  # hacky, though.
  my @cmd = (qw(git log --format= --reverse -p), $rev_tip);
  push @cmd, "^$db_tip" if defined $db_tip;
  open(my $fh, "-|", @cmd);

  my $commit;
  while (<$fh>) {
if (m{^\+\+\+ b/([0-9a-f/]+)}) {
  $commit = $1;
  $commit =~ s/[^0-9a-f]//g;
} elsif (/^\+Message-Id: <(.*)>/i) {
  print STDERR "Imported $commit => $1\n";
  $h{$1} = $commit;
}
  }
  $h{TIP} = $rev_tip;
}

print "$h{$_} $_\n" for @ARGV;
-- >8 --

That stores it in a local dbm. But it could also build a git-notes tree
if you really want that.

And if I understand what is being said here:

> > It certainly does not belong to cycles worth spending by me *while*
> > I work during the say with various history reshaping tools to record
> > and/or update the reverse mapping and that is why my post-applypatch
> > hook no longer has the "reverse map" hack.
> > 
> > It is not like anybody (including me) needs realtime up-to-date
> > reverse mapping from amlog while I run my "commit --amend", "rebase
> > -i", etc. and the reverse map is constructable by reversing the
> > forward map, obviously, with a postprocessing.  And I think that is
> > a reasonably way forward if anybody wants to have a reverse mapping.
> > The postprocessing can be done either by me before pushing out the
> > amlog ref, or done by any consumer after fetching the amlog ref from
> > me.  If I did the postprocessing and refuse to use rewrite hook you
> > wouldn't even know ;-)

It is not "I refuse to push out a reverse mapping". It is "I could make
the reverse mapping before push-out, and you would not need to know or
care if I did it all at once, or using a rewrite hook".

Though personally, I do not know if there is much point in pushing it
out, given that receivers can reverse the mapping themselves.

Or is there some argument that there is information in the reverse map
that _cannot_ be generated from the forward map?

-Peff


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-21 Thread Johannes Schindelin
Hi,

On Fri, 20 Jul 2018, Stefan Beller wrote:

> +cc list
> On Fri, Jul 20, 2018 at 2:29 PM Junio C Hamano  wrote:
> > ... which means that it does not matter if I have an elaborate rewrite hook
> > that constantly updates the reverse mapping or if the reverse mapping is
> > made immediately before I push out. You wouldn't even be able to tell any
> > difference.
> >
> > And for that matter, it could even be made on the receiving end by you
> > after you fetch from me before you need the reverse map information.

I refuse to believe that the suggestion to go back to the equivalent of
pencil and paper is sincere. We are developing a state of the art source
code management tool here, not some hodge podge project of somebody who is
trying to teach themselves C.

The current state is that there is no reliable "paper trail" of code
contributions. The solution to that is absolutely not to abolish what
little of a paper trail we *do* have. The solution is to step up the game
and correct that automated record.

And I would have expected a lot better from the inventor of the pickaxe
options: why care so much about source code "archeology" on the one hand,
and then burning the library on the other hand? It just does not make
sense.

Ciao,
Dscho


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-21 Thread Johannes Schindelin
Hi Junio,

On Fri, 20 Jul 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > AFAICT there is at least one scenario where you run `rebase -i`, the notes
> > get updated, and of course the *reverse mapping* does *not* get updated:
> 
> It turns out that I never had a rewrite hook; the notes.rewriteref
> mechanism is the only thing that has been used to maintain amlog.
> 
> I've stopped populating the reverse mapping, by the way.

That's just great. I ask you to make my life easier by keeping the
information correct, and now you just drop it altogether? Just great.

Seriously, I am trying to *improve* something here, because I really do
care about contributors, and how hard we make it on them. I would not have
expected such a backlash against that.

> The script that I feed a message from gmane or public-inbox when I need
> to learn the set of commits that resulted from the message instead uses
> "git grep $message-id notes/amlog".  And that is fast enough for my
> purpose.

Awesome. You might want to make sure that Peff stops advertising the amlog
notes, then, though.

> There is no good reason to abuse the notes mechanism to map a random
> object-name looking string (i.e. hash result of message id), other
> than the ease of "quick access" when somebody is making a lot of
> inquiry, but that "database" does not have to be stored in notes.

Right. And it does not have to be stored anywhere, because nobody used it
anyway, right?

Well, I hate to break it to you: I just found a really excellent use case,
and you are making it very, very hard for me. Deliberately so. I don't
know how I deserve that.

> It certainly does not belong to cycles worth spending by me *while*
> I work during the say with various history reshaping tools to record
> and/or update the reverse mapping and that is why my post-applypatch
> hook no longer has the "reverse map" hack.
> 
> It is not like anybody (including me) needs realtime up-to-date
> reverse mapping from amlog while I run my "commit --amend", "rebase
> -i", etc. and the reverse map is constructable by reversing the
> forward map, obviously, with a postprocessing.  And I think that is
> a reasonably way forward if anybody wants to have a reverse mapping.
> The postprocessing can be done either by me before pushing out the
> amlog ref, or done by any consumer after fetching the amlog ref from
> me.  If I did the postprocessing and refuse to use rewrite hook you
> wouldn't even know ;-)

The idea that you publish the amlog notes just for your own use cases,
sounds a bit strange to me.

So to reiterate: the information you have in amlog is useful, if faulty.
Rather than "fixing" it by stopping the useful reverse-mapping, it would
make a ton more sense to instate that post-rewrite hook I already drafted
for you.

Besides, while you spent all of that time to make things harder for me,
you still did not look into the most worrisome of my findings: there are
apparently Message-Id mappings where *none* of the commits returned by
said `git grep` you mentioned above are valid. Not a single one. I will
dig out the mail for you on Monday, because I care that much, where I
provided one example of a Message-Id with two commits that match in amlog,
none of which is actually reachable from any of your public branches, and
I also provided the commit that *actually* corresponds to that Message-Id,
and it is not annotated.

So at least in this case *even you* should have a vested interest in
figuring out what goes wrong because even your own use case is affected by
it.

Ciao,
Dscho


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Stefan Beller
+cc list
On Fri, Jul 20, 2018 at 2:29 PM Junio C Hamano  wrote:
> ... which means that it does not matter if I have an elaborate rewrite hook
> that constantly updates the reverse mapping or if the reverse mapping is
> made immediately before I push out. You wouldn't even be able to tell any
> difference.
>
> And for that matter, it could even be made on the receiving end by you
> after you fetch from me before you need the reverse map information.


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Junio C Hamano
Stefan Beller  writes:

> On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano  wrote:
>
>> It is not like anybody (including me) needs realtime up-to-date
>
> I thought the same for a long time, but contributing to other projects
> showed me that this is not necessarily the case. Having a real time
> update, even if it would be just "your patch is labeled 'under discussion'"
> is beneficial as I would know where it is "in the system".

Well, you wouldn't have an access to the up-to-date amlog maintained
by me *UNTIL* I push it out at the end of the day.  So by
definition, you do not have real-time access to the up-to-date
state.

And also by definition, you do not *NEED* such an access, because
you won't see newly created or rewritten commits, whose originating
Message-Id is not in the copy of amlog you have (yet), until I push
the day's integration result out *AND* you fetch what I pushed out.



Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Stefan Beller
On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano  wrote:

> It is not like anybody (including me) needs realtime up-to-date

I thought the same for a long time, but contributing to other projects
showed me that this is not necessarily the case. Having a real time
update, even if it would be just "your patch is labeled 'under discussion'"
is beneficial as I would know where it is "in the system".

In a way I'd compare our contribution process to having an
incredible fine grained paper map. Most of the world moved
on to digital maps, that zoom in on-demand.
(C.f. spelling out "See banned.h for banned functions" in
Documentation/CodingGuidelines is a fine grained detail
that is not relevant for *most* of the contributions, but just
burdens the bearer of the paper map with weight; if this hint
is given dynamically by the compiler or build system at relevant
times, it is much better;

Regarding the real time aspect here, it is also very good
comparison to maps: While I know how to read paper maps
(or offline maps) and how to navigate my way, it sure is easier
to just follow the online up-to-date navigation service, that
tells me what to do. )


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Junio C Hamano
Johannes Schindelin  writes:

> AFAICT there is at least one scenario where you run `rebase -i`, the notes
> get updated, and of course the *reverse mapping* does *not* get updated:

It turns out that I never had a rewrite hook; the notes.rewriteref
mechanism is the only thing that has been used to maintain amlog.

I've stopped populating the reverse mapping, by the way.  The script
that I feed a message from gmane or public-inbox when I need to
learn the set of commits that resulted from the message instead uses
"git grep $message-id notes/amlog".  And that is fast enough for my
purpose.

There is no good reason to abuse the notes mechanism to map a random
object-name looking string (i.e. hash result of message id), other
than the ease of "quick access" when somebody is making a lot of
inquiry, but that "database" does not have to be stored in notes.
It certainly does not belong to cycles worth spending by me *while*
I work during the say with various history reshaping tools to record
and/or update the reverse mapping and that is why my post-applypatch
hook no longer has the "reverse map" hack.

It is not like anybody (including me) needs realtime up-to-date
reverse mapping from amlog while I run my "commit --amend", "rebase
-i", etc. and the reverse map is constructable by reversing the
forward map, obviously, with a postprocessing.  And I think that is
a reasonably way forward if anybody wants to have a reverse mapping.
The postprocessing can be done either by me before pushing out the
amlog ref, or done by any consumer after fetching the amlog ref from
me.  If I did the postprocessing and refuse to use rewrite hook you
wouldn't even know ;-)



Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Johannes Schindelin
Hi Junio,

On Thu, 19 Jul 2018, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Johannes Schindelin  writes:
> >
> >> I would like to ask you to reinstate the post-rewrite hook, as it still
> >> improves the situation over the current one.
> >
> > Without post-rewrite I seem to be getting correct amlog entries for
> > commits created by "git rebase"; do our rebase--am backend still
> > trigger post-applypatch hook in its "am" phase to apply the patches
> > created with "format-patch"?
> 
> That was a wrong line of thought that led to a dead end.  format-patch
> won't recreate Message-Id to its output from notes/amlog, so even if
> the "format-patch --stdout | am" pipeline inside rebase-am triggered
> the post-applypatch hook, it would not have a chance to carry the
> notes forward that way.
> 
> What was really happening was I have
> 
>   $ git config --list | grep amlog
>   notes.rewriteref=refs/notes/amlog
> 
> and that ought to be sufficient to carry "commit-to-original-msg-id"
> entries across rebases.  And it seems to correctly work.  I however
> suspect that "cherry-pick A..B" may lose the notes, but I haven't
> checked.

AFAICT there is at least one scenario where you run `rebase -i`, the notes
get updated, and of course the *reverse mapping* does *not* get updated:
you have a mapping both from commit to Message-Id *and crucially* from
Message-Id to commit. The automatic rewrite of commit notes in `rebase -i`
tackles only the commit notes, obviously, not the reverse.

Hence the post-rewrite hook I think I already suggested at least once in a
previous reply.

Ciao,
Dscho


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-19 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> I would like to ask you to reinstate the post-rewrite hook, as it still
>> improves the situation over the current one.
>
> Without post-rewrite I seem to be getting correct amlog entries for
> commits created by "git rebase"; do our rebase--am backend still
> trigger post-applypatch hook in its "am" phase to apply the patches
> created with "format-patch"?

That was a wrong line of thought that led to a dead end.  format-patch
won't recreate Message-Id to its output from notes/amlog, so even if
the "format-patch --stdout | am" pipeline inside rebase-am triggered
the post-applypatch hook, it would not have a chance to carry the
notes forward that way.

What was really happening was I have

$ git config --list | grep amlog
notes.rewriteref=refs/notes/amlog

and that ought to be sufficient to carry "commit-to-original-msg-id"
entries across rebases.  And it seems to correctly work.  I however
suspect that "cherry-pick A..B" may lose the notes, but I haven't
checked.






Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Junio C Hamano
Johannes Schindelin  writes:

> I would like to ask you to reinstate the post-rewrite hook, as it still
> improves the situation over the current one.

Without post-rewrite I seem to be getting correct amlog entries for
commits created by "git rebase"; do our rebase--am backend still
trigger post-applypatch hook in its "am" phase to apply the patches
created with "format-patch"?



Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Johannes Schindelin
Hi Junio,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > To summarize, there are two commits recorded for that Message-Id, the
> > later one not mapped back, and neither is the correct commit that made it
> > into `master`.
> >
> > It would be nice to figure out what went wrong there, and how to fix it
> > for the future (and also to fix up the existing mis-mappings in `amlog`).
> 
> I think what happened is that I used to have post-rewrite, but
> because it did not solve the real issue of multiple commits existing
> for the same message ID (either because of amending, or because of
> running "am" multiple times while looking for the best base to
> contruct a topic branch for the series that contains it) *and* the
> one that will eventually used in the final history may not be the
> last one (e.g. I may "am" twice to see if an older base I use in my
> second attempt is a better one than the base I originally used, and
> the patches may even apply cleanly to the older history, but may
> turn out to need semantic adjustment, at which point I would discard
> that second attempt and use the old commit from the first attempt
> that built on a newer base), I stopped using it.
> 
> The mid-to-commit, for it to be relialble, needs to keep mapping for
> all the commits created from a single message, instead of being the
> last-one-survives mapping.  I just didn't have that much interest
> back when I decided it was not worth and dropped the post-rewrite, I
> think.

I would like to ask you to reinstate the post-rewrite hook, as it still
improves the situation over the current one.

Of course, it would be nice to get the automation into a shape where
the mappings in `refs/notes/amlog` of commits that hit `next` are fixed,
if necessary, to stop referring to commits that did not make it into
`next`.

Because the *concept* of `amlog` is quite useful, to put back at least
*some* of the information we lost by transiting Git commits via mails
without any connection to their original commits. It is still the most
annoying thing when I contribute patches myself.

Ciao,
Dscho


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-11 Thread Junio C Hamano
Johannes Schindelin  writes:

> To summarize, there are two commits recorded for that Message-Id, the
> later one not mapped back, and neither is the correct commit that made it
> into `master`.
>
> It would be nice to figure out what went wrong there, and how to fix it
> for the future (and also to fix up the existing mis-mappings in `amlog`).

I think what happened is that I used to have post-rewrite, but
because it did not solve the real issue of multiple commits existing
for the same message ID (either because of amending, or because of
running "am" multiple times while looking for the best base to
contruct a topic branch for the series that contains it) *and* the
one that will eventually used in the final history may not be the
last one (e.g. I may "am" twice to see if an older base I use in my
second attempt is a better one than the base I originally used, and
the patches may even apply cleanly to the older history, but may
turn out to need semantic adjustment, at which point I would discard
that second attempt and use the old commit from the first attempt
that built on a newer base), I stopped using it.

The mid-to-commit, for it to be relialble, needs to keep mapping for
all the commits created from a single message, instead of being the
last-one-survives mapping.  I just didn't have that much interest
back when I decided it was not worth and dropped the post-rewrite, I
think.



refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-09 Thread Johannes Schindelin
Hi Junio,

On Sun, 8 Jul 2018, Johannes Schindelin wrote:

> I just encoutered a problem with your `refs/notes/amlog` and I hope you
> can help me with that.
> 
> Concretely, I want GitGitGadget to be able to identify the commit that
> corresponds to a given mail that contained a patch (if it ever made it
> into `pu`), to automate all kinds of tedious things that I currently have
> to perform manually.
> 
> And here I hit a block: I am looking for the commit corresponding to
> aca087479b35cbcbd7c84c7ca3bcf556133d0548.1530274571.git.gitgitgad...@gmail.com
> 
> When I ask `git notes --ref=refs/notes/gitster-amlog show
> 4cec3986f017d84c8d6a2c4233d2eba4a3ffa60d` (the SHA-1 is the one
> corresponding to `Message-Id: <...>` for that mail), it insists on
> outputting
> 
>   5902152ab02291af4454f24a8ccaf2adddefc306
> 
> However, I cannot find that commit anywhere.
> 
> When I look for the commit in the same manual, tedious way that I want to
> automate, I find that it *is* in `pu`, but as
> 
>   5cf8e064747be2026bb23be37f84f2f0b2a31781
> 
> Even curiouser: when I now ask for the commit notes for both of those
> SHA-1s, I get back the correct, same Message-Id *for both of them*, which
> makes me think that it was recorded correctly, but then overwritten due to
> some process I don't understand.
> 
> Would you be able to shed light into this?

I think I reconstructed the culprit:

In https://github.com/git/git/commit/a7cddab6e8, your post-applypatch hook
added the note for commit 5902152ab02291af4454f24a8ccaf2adddefc306 that it
was generated from Message-Id:
,
and then https://github.com/git/git/commit/ff28c8f9283 added the note to
map that Message-Id back to that commit.

So far, so good!

But then, https://github.com/git/git/commit/81b08c718e9 indicates that you
ran an interactive rebase and amended the commit
5902152ab02291af4454f24a8ccaf2adddefc306 and the result was a new commit
5cf8e064747be2026bb23be37f84f2f0b2a31781 that was then also mapped to that
Message-Id.

And obviously, you lack a post-rewrite hook a la

```sh
refopt=--ref=refs/notes/amlog
while read old new rest
do
mid="$(git notes $refopt show $old 2>/dev/null)" &&
git notes $refopt set -m "$mid" $new
done
```

I was pretty happy to figure that out all on my own, and already on my way
to come up with that post-rewrite hook and a script to parse all of the
commits in refs/notes/amlog whose commit message contains `commit --amend`
to fix those problems, but before starting, I wanted to sanity check the
oldest such commit: https://github.com/git/git/commit/49bc3858e3c

You will be readily able to verify that it maps the commit
73bfebd43e14bcc1502577c0933b6a16ad540b99 to Message-Id:
<20170619175605.27864-3-phillip.w...@talktalk.net>, but that 7c1a3dcf23e
(which corresponds to that Message-Id) maps to
f64760904766db662badf1256923532b9e1a6ebd. So yes, there is the same
problem with this mapping, and we need to fix it.

*However*. Neither https://github.com/git/git/commit/73bfebd43e1 nor
https://github.com/git/git/commit/f6476090476 show any commit!

Does that mean that the patch with that Message-Id never made it into
`master` and was simply dropped and gc'ed at some stage?

Actually, no:
https://public-inbox.org/git/20170619175605.27864-3-phillip.w...@talktalk.net/
corresponds quite clearly to
https://github.com/git/git/commit/1ceb9dfab7e

Now, that commit message was clearly edited by you (I note the capital "A"
in Phillip's "Add" vs your lower-case "a" in "add"), but the patch
quite obviously made it into our code based in its original shape.

So I looked for the commit notes for that commit, but there aren't any!

To summarize, there are two commits recorded for that Message-Id, the
later one not mapped back, and neither is the correct commit that made it
into `master`.

It would be nice to figure out what went wrong there, and how to fix it
for the future (and also to fix up the existing mis-mappings in `amlog`).

However, at this stage I really have not enough information at my hands,
even with as much effort as I spent so far to figure out where my patch
went (which started this bug hunt). Could you kindly spend some time on
that? Otherwise, `amlog` is a lot less useful than it could otherwise be.

Thanks,
Dscho

P.S.: funny side note: it would appear that the rewritten notes all get
the author of the patch author, look e.g. at the author of
https://github.com/git/git/commit/81b08c718e97