Re: Git garden shears, was Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-25 Thread Johannes Schindelin
Hi Kuba,

On Wed, 21 Sep 2016, Jakub Narębski wrote:

> W dniu 11.09.2016 o 10:33, Johannes Schindelin napisał: 
> > On Fri, 9 Sep 2016, Jakub Narębski wrote:
> [...]
> 
> >> When preserving merges, there are (as far as I understand it), two
> >> problems:
> >>  - what it means to preserve changes (which change to pick,
> >>that is what is the mainline changes rebase is re-applying)
> >>  - what are parents of the merge commit (at least one parent
> >>would be usually rewritten)
> >>
> >> Maybe the internal (and perhaps also user-visible) representation
> >> of merge in instruction sheet could use the notation of filter-branch,
> >> that is 'map()'... it could also imply the mainline.
> >>
> >> That is the instruction in the internal instruction sheet could
> >> look like this:
> >>
> >>   merge -m 1 map(2fd4e1c6...) da39a3ee... \t Merge 'foo' into master  
> >>
> >>
> >> Note that it has nothing to do with this series!
> > 
> > Right. But I did solve that already. In the Git garden shears [*1*]
> > (essentially my New And Improved attempt at recreating branch structures
> > while rebasing), I generate and process scripts like this:
> > 
> > mark onto
> > 
> > # Branch: super-cool-feature
> > rewind onto
> > pick 1 feature
> > pick 2 documentation
> > mark super-cool-feature
> > 
> > # Branch: typo-fix
> > rewind onto
> > pick a fix a tyop
> 
> There probably should be there
> 
>   mark typo-fix

Correct. Sorry for the omission.

> > rewind onto
> > merge -C cafebabe super-cool-feature
> > merge -C babecafe typo-fix
> > 
> > cleanup super-cool-feature typo-fix
> > 
> > Of course this will change a little, still, once I get around to implement
> > this on top of the rebase--helper.
> 
> Do I understand it correctly that it is user-visible instruction sheet, and
> not the internal instruction sheet for sequencer?  This looks very nice
> and is well readable.

It is intended as that.

Currently I need a little trickery to make this work, as rebase -i does
not understand rewind nor merge. The trick is to re-use the shears.sh
script as editor that then populates the edit script, calls the real
editor, and then installs a temporary alias that gets called for all
custom commands via exec (turning e.g. "rewind abc" into "exec git .r
rewind abc").

The temporary alias point back to the shears script, too, of course.

In the end, I hope to teach the sequencer a variant of this dialect, as
well as the trick to generate such edit scripts.

> I guess that it needs to be pre-populated by Git based on topology of the
> branch being rebased.

Yes. The shears.sh script is in charge of that, and it has to perform a
couple of Git calls to do so.

> As I see, there are three basic topologies of non-linear branch to be
> rebased; all else is combination of thereof, or derivative:
> 
> 1. Merge commit without branching point, that is we need to go
>from the following situation
> 
>*---*---*---#---o---o---o<-- old base
>\\
> \\=a===b===M===c<-- branch being rebased
>   /
>  ...---x---x---x-/  <-- side branch
> 
>   to the following:
> 
>*---*---*---#---o---o---o
> \
>  \-a'--b'--M'--c' 
>   /
>  ...---x---x---x-/  

In other words: rebasing a merge commit merging non-rebased commits.

This is not yet supported by the shears script, as it would require logic
that is not only slow in shell script, but also convoluted. IOW this
feature waits for the sequencer to know how to run regular rebase -i
already.

> I think this case is the only one supported by `--preserve-merges`,
> but I may be mistaken - I never had the need to use this feature IRL.

No, -p would handle merges of non-rebased commits as well as merges of
to-be-rebased commits.

> 2. Branching point without accompanying merge commit, or in other words
>rebasing many branches tied together; a shrub if you will.  That is,
>we want to go from the following situation:
> 
>*---*---*---#---o---o---o   <-- old base
>\
> \--a---b---c   <-- branch being rebased
> \
>  \-1   <-- dependent branch
> 
>to the following one:
> 
>*---*---*---#---o---o---o
> \
>  \--a'--b'--c'
>  \
>   \-1'

This is outside the scope of rebase -i (and of the shears), as you are
talking about *parallel* rebases. I don't do that, nor does rebase -i,
rebase -p nor the shears.

> I don't think Git supports something like that out of the box, but it
> is not hard to create something like that "by hand". It is not much
> of a problem... unless you forget to rebase the 

Re: Git garden shears, was Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-21 Thread Jakub Narębski
Hello Dscho,

W dniu 11.09.2016 o 10:33, Johannes Schindelin napisał: 
> On Fri, 9 Sep 2016, Jakub Narębski wrote:
[...]

>> When preserving merges, there are (as far as I understand it), two
>> problems:
>>  - what it means to preserve changes (which change to pick,
>>that is what is the mainline changes rebase is re-applying)
>>  - what are parents of the merge commit (at least one parent
>>would be usually rewritten)
>>
>> Maybe the internal (and perhaps also user-visible) representation
>> of merge in instruction sheet could use the notation of filter-branch,
>> that is 'map()'... it could also imply the mainline.
>>
>> That is the instruction in the internal instruction sheet could
>> look like this:
>>
>>   merge -m 1 map(2fd4e1c6...) da39a3ee... \t Merge 'foo' into master  
>>
>>
>> Note that it has nothing to do with this series!
> 
> Right. But I did solve that already. In the Git garden shears [*1*]
> (essentially my New And Improved attempt at recreating branch structures
> while rebasing), I generate and process scripts like this:
> 
>   mark onto
> 
>   # Branch: super-cool-feature
>   rewind onto
>   pick 1 feature
>   pick 2 documentation
>   mark super-cool-feature
> 
>   # Branch: typo-fix
>   rewind onto
>   pick a fix a tyop

There probably should be there

mark typo-fix

> 
>   rewind onto
>   merge -C cafebabe super-cool-feature
>   merge -C babecafe typo-fix
> 
>   cleanup super-cool-feature typo-fix
> 
> Of course this will change a little, still, once I get around to implement
> this on top of the rebase--helper.

Do I understand it correctly that it is user-visible instruction sheet, and
not the internal instruction sheet for sequencer?  This looks very nice
and is well readable.

I guess that it needs to be pre-populated by Git based on topology of the
branch being rebased.

As I see, there are three basic topologies of non-linear branch to be
rebased; all else is combination of thereof, or derivative:

1. Merge commit without branching point, that is we need to go
   from the following situation

   *---*---*---#---o---o---o<-- old base
   \\
\\=a===b===M===c<-- branch being rebased
  /
 ...---x---x---x-/  <-- side branch

  to the following:

   *---*---*---#---o---o---o
\
 \-a'--b'--M'--c' 
  /
 ...---x---x---x-/  

I think this case is the only one supported by `--preserve-merges`,
but I may be mistaken - I never had the need to use this feature IRL.

2. Branching point without accompanying merge commit, or in other words
   rebasing many branches tied together; a shrub if you will.  That is,
   we want to go from the following situation:

   *---*---*---#---o---o---o   <-- old base
   \
\--a---b---c   <-- branch being rebased
\
 \-1   <-- dependent branch

   to the following one:

   *---*---*---#---o---o---o
\
 \--a'--b'--c'
 \
  \-1'

I don't think Git supports something like that out of the box, but it
is not hard to create something like that "by hand". It is not much
of a problem... unless you forget to rebase the second dependent branch.

3. Branching point with merge point, that is subbranch created and
   merged - an "eye" (it is not a loop in DAG):

   *---*---*---#---o---o---o <-- old base
   \
\--a---b---c---M---d <-- branch being rebased
\ /
 \-1---2-/ [ <-- possibly a branch ]

   All edges are directed edges, with arrows pointing from right to
   left; that is  *---*  is really  *<---*

   The expected result is:

   *---*---*---#---o---o---o
\
 \--a'--b'--c'--M'--d'
 \ /
  \-1'--2'/

I guess that is the main purpose of your git-garden-shears script,
isn't it?

> 
> For example, I am not so hot about the "merge -C ..." syntax. I'll
> probably split that into a "remerge  " and a new "merge
> " command (the latter asking interactively for the merge commit
> message).

There is also an additional complication in that merge commit message
may be *partially* automatically generated. First there is the subject
generated by 'git merge' ("Merge branch 'foo'") or 'git pull '.
It might have been translated, or extended.  Second there is a place
for branch cover letter. Third, subject to merge.log / merge.summary
there is a shortlog.

>From those shortlog should be surely updated to correspond to the
post-rebase state.  The first line could b

Git garden shears, was Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-11 Thread Johannes Schindelin
Hi Kuba,

On Fri, 9 Sep 2016, Jakub Narębski wrote:

> Hello Johannes,
> 
> W dniu 09.09.2016 o 17:12, Johannes Schindelin napisał:
> > On Thu, 1 Sep 2016, Junio C Hamano wrote: 
> >> Johannes Schindelin  writes:
> 
> >> I was sort of expecting that, when you do the preserve-merges mode
> >> of "rebase -i", you would need to jump around, doing "we have
> >> reconstructed the side branch on a new 'onto', let's give the result
> >> this temporary name ':1', and then switch to the trunk (which would
> >> call for 'reset ' instruction) and merge that thing (which
> >> would be 'merge :1' or perhaps called 'pick :1')", and at that point
> >> you no longer validate the object references upfront.
> > 
> > Except that is not how --preserve-merges works: it *still* uses the SHA-1s
> > as identifiers, even when the SHA-1 may have changed in the meantime.
> > 
> > That is part of why it was a bad design.
> 
> When preserving merges, there are (as far as I understand it), two
> problems:
>  - what it means to preserve changes (which change to pick,
>that is what is the mainline changes rebase is re-applying)
>  - what are parents of the merge commit (at least one parent
>would be usually rewritten)
> 
> Maybe the internal (and perhaps also user-visible) representation
> of merge in instruction sheet could use the notation of filter-branch,
> that is 'map()'... it could also imply the mainline.
> 
> That is the instruction in the internal instruction sheet could
> look like this:
> 
>   merge -m 1 map(2fd4e1c67a2d28fced849ee1bb76e7391b93eb12) 
> da39a3ee5e6b4b0d3255bfef95601890afd80709 \t Merge 'foo' into master  
> 
> 
> Note that it has nothing to do with this series!

Right. But I did solve that already. In the Git garden shears [*1*]
(essentially my New And Improved attempt at recreating branch structures
while rebasing), I generate and process scripts like this:

mark onto

# Branch: super-cool-feature
rewind onto
pick 1 feature
pick 2 documentation
mark super-cool-feature

# Branch: typo-fix
rewind onto
pick a fix a tyop

rewind onto
merge -C cafebabe super-cool-feature
merge -C babecafe typo-fix

cleanup super-cool-feature typo-fix

Of course this will change a little, still, once I get around to implement
this on top of the rebase--helper.

For example, I am not so hot about the "merge -C ..." syntax. I'll
probably split that into a "remerge  " and a new "merge
" command (the latter asking interactively for the merge commit
message).

And also: the cleanup stage should not be necessary, as the "mark"
commands can accumulate the known marks into a file in the state
directory.

But you get the idea.

No :1 or some such. That's machine readable. But it's utter nonsense for
user-facing UIs.

Ciao,
Dscho

Footnote *1*:
https://github.com/git-for-windows/build-extra/blob/master/shears.sh

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-09 Thread Jakub Narębski
Hello Johannes,

W dniu 09.09.2016 o 17:12, Johannes Schindelin napisał:
> On Thu, 1 Sep 2016, Junio C Hamano wrote: 
>> Johannes Schindelin  writes:

>> I was sort of expecting that, when you do the preserve-merges mode
>> of "rebase -i", you would need to jump around, doing "we have
>> reconstructed the side branch on a new 'onto', let's give the result
>> this temporary name ':1', and then switch to the trunk (which would
>> call for 'reset ' instruction) and merge that thing (which
>> would be 'merge :1' or perhaps called 'pick :1')", and at that point
>> you no longer validate the object references upfront.
> 
> Except that is not how --preserve-merges works: it *still* uses the SHA-1s
> as identifiers, even when the SHA-1 may have changed in the meantime.
> 
> That is part of why it was a bad design.

When preserving merges, there are (as far as I understand it), two
problems:
 - what it means to preserve changes (which change to pick,
   that is what is the mainline changes rebase is re-applying)
 - what are parents of the merge commit (at least one parent
   would be usually rewritten)

Maybe the internal (and perhaps also user-visible) representation
of merge in instruction sheet could use the notation of filter-branch,
that is 'map()'... it could also imply the mainline.

That is the instruction in the internal instruction sheet could
look like this:

  merge -m 1 map(2fd4e1c67a2d28fced849ee1bb76e7391b93eb12) 
da39a3ee5e6b4b0d3255bfef95601890afd80709 \t Merge 'foo' into master  


Note that it has nothing to do with this series!

Best regards,
-- 
Jakub Narębski



Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-09 Thread Johannes Schindelin
Hi Junio,

On Thu, 1 Sep 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> though).  The "one sequencer to rule them all" may even have to say
> >> "now give name ':1' to the result of the previous operation" in one
> >> step and in another later step have an instruction "merge ':1'".
> >> When that happens, you cannot even pre-populate the commit object
> >> when the sequencer reads the file, as the commit has not yet been
> >> created at that point.
> >
> > These considerations are pretty hypothetical. I would even place a bet
> > that we will *never* have ":1" as names, not if I have anything to say...
> > ;-)
> 
> If you can always work with pre-existing commit, then you can
> validate all object references that appear in the instructions
> upfront.

Or if *some* of the commands work with pre-existing commits, *those*
commands can be validated up-front.

Which is exactly what my code does.

> I was sort of expecting that, when you do the preserve-merges mode
> of "rebase -i", you would need to jump around, doing "we have
> reconstructed the side branch on a new 'onto', let's give the result
> this temporary name ':1', and then switch to the trunk (which would
> call for 'reset ' instruction) and merge that thing (which
> would be 'merge :1' or perhaps called 'pick :1')", and at that point
> you no longer validate the object references upfront.

Except that is not how --preserve-merges works: it *still* uses the SHA-1s
as identifiers, even when the SHA-1 may have changed in the meantime.

That is part of why it was a bad design.

> If you do not have to have such a "mark this point" and a "refer to
> that point we previously marked", then I agree that you should be
> able to pre-validate and keep the result in the structure.

Even then, those markers should *still* be validated. They, too, need to
be created and later used, usage before creation would be an error.

But...

1) this is not yet a problem, so why are we discussing it here? Do we not
   have actual problems with these patches to discuss anymore?

2) the SHA-1s that *can* be validated *should* be validated, so I find the
   objection a little bogus.

Ciao,
Dscho


Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-09 Thread Johannes Schindelin
Hi Junio,

On Thu, 1 Sep 2016, Junio C Hamano wrote:

> Jakub Narębski  writes:
> 
> > I wonder how probable is situation where we save instruction sheet
> > for interactive rebase, with shortened SHA-1, and during rebase
> > shortened SHA-1 stops being unambiguous...
> 
> It is my understanding that the shortened ones are only for end-user
> consumption.  The insn sheet internally uses fully expanded form for
> this exact reason, and then abbreviated back at each step before the
> updated one is presented to the end-user.  Uniqueness guarantee is
> enforced with new objects created during each step taken into
> account by doing it this way.

Indeed, the rebase -i shortens the SHA-1s just before letting the user
edit git-rebase-todo and then expands them back.

Ciao,
Dscho

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Junio C Hamano
Jakub Narębski  writes:

> I wonder how probable is situation where we save instruction sheet
> for interactive rebase, with shortened SHA-1, and during rebase
> shortened SHA-1 stops being unambiguous...

It is my understanding that the shortened ones are only for end-user
consumption.  The insn sheet internally uses fully expanded form for
this exact reason, and then abbreviated back at each step before the
updated one is presented to the end-user.  Uniqueness guarantee is
enforced with new objects created during each step taken into
account by doing it this way.




Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Jakub Narębski
Hello,

W dniu 01.09.2016 o 15:12, Johannes Schindelin pisze:
> On Wed, 31 Aug 2016, Jakub Narębski wrote:
>> W dniu 31.08.2016 o 21:10, Junio C Hamano pisze:

>>> So I am not sure if we want a parsed commit there (I would not
>>> object if we kept the texual object name read from the file,
>>> though).  The "one sequencer to rule them all" may even have to say
>>> "now give name ':1' to the result of the previous operation" in one
>>> step and in another later step have an instruction "merge ':1'".
>>> When that happens, you cannot even pre-populate the commit object
>>> when the sequencer reads the file, as the commit has not yet been
>>> created at that point.
>>
>> True, --preserve-merges rebase is well, different.
> 
> It is mis-designed. And I can be that harsh because it was my design.
> 
> In the meantime I came up with a much better design, and implemented it as
> a shell script on top of rebase -i. Since shell scripts run like slow
> molasses, even more so on Windows, I have a loose plan to implement its
> functionality as a new --recreate-merges option, and to deprecate
> --preserve-merges when that new option works.
> 
> It needs to be a new option (not a --preserve-merges=v2) because it is a
> totally different beast. For starters, it does not need its own code path
> that overrides pick_one, as --preserve-merges does.

Better preserving for merges (with cleanly defined sematics)
would be certainly nice to have.

> But I get way ahead of myself. First we need to get these last few bits
> and pieces in place to accelerate (non --preserve-merges) rebase -i.

But it can wait, right.

-- 
Jakub Narębski




Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Jakub Narębski
W dniu 01.09.2016 o 11:37, Johannes Schindelin pisze:
> On Wed, 31 Aug 2016, Junio C Hamano wrote:
>> Jakub Narębski  writes:
>>
 diff --git a/sequencer.c b/sequencer.c
 index 06759d4..3398774 100644
 --- a/sequencer.c
 +++ b/sequencer.c
 @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
 *opts)
  struct todo_item {
enum todo_command command;
struct commit *commit;
 +  const char *arg;
 +  int arg_len;
>>>
>>> Why 'arg', and not 'oneline', or 'subject'?
>>> I'm not saying it is bad name.
>>
>> I am not sure what the "commit" field of type "struct commit *" is
>> for.  It is not needed until it is the commit's turn to be picked or
>> reverted; if we end up stopping in the middle, parsing the commit
>> object for later steps will end up being wasted effort.
> 
> No, it won't be wasted effort, as we *validate* the todo script this way.
> And since we may very well need the info later (most rebases do not fail
> in the middle), we store it, too.

The question was (I think) whether we should do eager parsing of
commits, or whether we can do lazy parsing by postponing full parsing
"until it is the commit's turn to be picked or reverted", and possibly
when saving todo file.

I wonder how probable is situation where we save instruction sheet
for interactive rebase, with shortened SHA-1, and during rebase
shortened SHA-1 stops being unambiguous...

>[...] after parsing the todo_list, we will
> have to act on the information contained therein. For example we will have
> to cherry-pick some of the indicated commits (requiring a struct commit *
> for use in do_pick_commit()). Another example: we may need to determine
> the oneline for use in fixup!/squash! reordering.
> 
> So: keeping *that* aspect of the previous todo_list parsing, i.e. store a
> pointer to the already-parsed commit, is the right thing to do.

The above probably means that eager eval is better

Best,
-- 
Jakub Narębski



Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Junio C Hamano
Johannes Schindelin  writes:

>> though).  The "one sequencer to rule them all" may even have to say
>> "now give name ':1' to the result of the previous operation" in one
>> step and in another later step have an instruction "merge ':1'".
>> When that happens, you cannot even pre-populate the commit object
>> when the sequencer reads the file, as the commit has not yet been
>> created at that point.
>
> These considerations are pretty hypothetical. I would even place a bet
> that we will *never* have ":1" as names, not if I have anything to say...
> ;-)

If you can always work with pre-existing commit, then you can
validate all object references that appear in the instructions
upfront.

I was sort of expecting that, when you do the preserve-merges mode
of "rebase -i", you would need to jump around, doing "we have
reconstructed the side branch on a new 'onto', let's give the result
this temporary name ':1', and then switch to the trunk (which would
call for 'reset ' instruction) and merge that thing (which
would be 'merge :1' or perhaps called 'pick :1')", and at that point
you no longer validate the object references upfront.

If you do not have to have such a "mark this point" and a "refer to
that point we previously marked", then I agree that you should be
able to pre-validate and keep the result in the structure.


Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Aug 2016, Junio C Hamano wrote:

> Jakub Narębski  writes:
> 
>  @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
>  *opts)
>   struct todo_item {
>   enum todo_command command;
>   struct commit *commit;
>  +const char *arg;
>  +int arg_len;
> >  
> >> I am not sure what the "commit" field of type "struct commit *" is
> >> for.  It is not needed until it is the commit's turn to be picked or
> >> reverted; if we end up stopping in the middle, parsing the commit
> >> object for later steps will end up being wasted effort.
> >
> > From what I understand this was what sequencer did before this
> > series, so it is not a regression (I think; the commit parsing
> > was in different function, but I think at the same place in
> > the callchain).
> 
> Yes, I agree with you and I didn't mean "this is a new bug" at all.
> It just is an indication that further refactoring after this step is
> needed, and it is likely to involve removal or modification of this
> field.

Not so. After you review the sequencer-i patches, you will see that it
makes tons of sense to have that "commit" field: "pick" is by far the most
common case, and it would not make sense at all to validate the todo
script, only to throw away the information we got, only to re-gain it
later in the sequencer.

Read: I strongly disagree with your cursory verdict that the "commit"
field should go.

Ciao,
Dscho

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Johannes Schindelin
Hi Kuba & Junio,

On Wed, 31 Aug 2016, Jakub Narębski wrote:

> W dniu 31.08.2016 o 21:10, Junio C Hamano pisze:
> > Jakub Narębski  writes:
> > 
> >>> diff --git a/sequencer.c b/sequencer.c
> >>> index 06759d4..3398774 100644
> >>> --- a/sequencer.c
> >>> +++ b/sequencer.c
> >>> @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
> >>> *opts)
> >>>  struct todo_item {
> >>>   enum todo_command command;
> >>>   struct commit *commit;
> >>> + const char *arg;
> >>> + int arg_len;
>  
> > I am not sure what the "commit" field of type "struct commit *" is
> > for.  It is not needed until it is the commit's turn to be picked or
> > reverted; if we end up stopping in the middle, parsing the commit
> > object for later steps will end up being wasted effort.
> 
> From what I understand this was what sequencer did before this
> series, so it is not a regression (I think; the commit parsing
> was in different function, but I think at the same place in
> the callchain).

Exactly.

> > Also, when the sequencer becomes one sequencer to rule them all, the
> > command set may contain something that does not even mention any
> > commit at all (think: exec).
> 
> The "exec" line is a bit of exception, all other rebase -i commands
> take commit as parameter.  It could always use NULL.

There is also "noop".

> > So I am not sure if we want a parsed commit there (I would not
> > object if we kept the texual object name read from the file,
> > though).  The "one sequencer to rule them all" may even have to say
> > "now give name ':1' to the result of the previous operation" in one
> > step and in another later step have an instruction "merge ':1'".
> > When that happens, you cannot even pre-populate the commit object
> > when the sequencer reads the file, as the commit has not yet been
> > created at that point.
> 
> True, --preserve-merges rebase is well, different.

It is mis-designed. And I can be that harsh because it was my design.

In the meantime I came up with a much better design, and implemented it as
a shell script on top of rebase -i. Since shell scripts run like slow
molasses, even more so on Windows, I have a loose plan to implement its
functionality as a new --recreate-merges option, and to deprecate
--preserve-merges when that new option works.

It needs to be a new option (not a --preserve-merges=v2) because it is a
totally different beast. For starters, it does not need its own code path
that overrides pick_one, as --preserve-merges does.

But I get way ahead of myself. First we need to get these last few bits
and pieces in place to accelerate (non --preserve-merges) rebase -i.

Ciao,
Dscho

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Aug 2016, Junio C Hamano wrote:

> Jakub Narębski  writes:
> 
> >> diff --git a/sequencer.c b/sequencer.c
> >> index 06759d4..3398774 100644
> >> --- a/sequencer.c
> >> +++ b/sequencer.c
> >> @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
> >> *opts)
> >>  struct todo_item {
> >>enum todo_command command;
> >>struct commit *commit;
> >> +  const char *arg;
> >> +  int arg_len;
> >
> > Why 'arg', and not 'oneline', or 'subject'?
> > I'm not saying it is bad name.
> 
> I am not sure what the "commit" field of type "struct commit *" is
> for.  It is not needed until it is the commit's turn to be picked or
> reverted; if we end up stopping in the middle, parsing the commit
> object for later steps will end up being wasted effort.

No, it won't be wasted effort, as we *validate* the todo script this way.
And since we may very well need the info later (most rebases do not fail
in the middle), we store it, too.

> Also, when the sequencer becomes one sequencer to rule them all, the
> command set may contain something that does not even mention any
> commit at all (think: exec).

Right, in which case the "commit" field will have the value... *drum
roll*... NULL!

> So I am not sure if we want a parsed commit there (I would not
> object if we kept the texual object name read from the file,
> though).  The "one sequencer to rule them all" may even have to say
> "now give name ':1' to the result of the previous operation" in one
> step and in another later step have an instruction "merge ':1'".
> When that happens, you cannot even pre-populate the commit object
> when the sequencer reads the file, as the commit has not yet been
> created at that point.

These considerations are pretty hypothetical. I would even place a bet
that we will *never* have ":1" as names, not if I have anything to say...
;-)

What is not so hypothetical is that after parsing the todo_list, we will
have to act on the information contained therein. For example we will have
to cherry-pick some of the indicated commits (requiring a struct commit *
for use in do_pick_commit()). Another example: we may need to determine
the oneline for use in fixup!/squash! reordering.

So: keeping *that* aspect of the previous todo_list parsing, i.e. store a
pointer to the already-parsed commit, is the right thing to do.

Ciao,
Dscho

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-09-01 Thread Johannes Schindelin
Hi Kuba,

On Wed, 31 Aug 2016, Jakub Narębski wrote:

> W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:
> 
> > The `git-rebase-todo` file contains a list of commands. Most of those
> > commands have the form
> > 
> >   
> > 
> > The  is displayed primarily for the user's convenience, as
> > rebase -i really interprets only the   part. However, there
> > are *some* places in interactive rebase where the  is used to
> > display messages, e.g. for reporting at which commit we stopped.
> > 
> > So let's just remember it when parsing the todo file; we keep a copy of
> > the entire todo file anyway (to write out the new `done` and
> > `git-rebase-todo` file just before processing each command), so all we
> > need to do is remember the begin and end offsets.
> 
> Actually what we remember is pointer and length, or begin offset and length,
> not offset and offset.

Right. Fixed.

> > diff --git a/sequencer.c b/sequencer.c
> > index 06759d4..3398774 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
> > *opts)
> >  struct todo_item {
> > enum todo_command command;
> > struct commit *commit;
> > +   const char *arg;
> > +   int arg_len;
> 
> Why 'arg', and not 'oneline', or 'subject'?
> I'm not saying it is bad name.

Because we will use it for `exec` commands' args, too. Clarified in the
commit message.

> > @@ -760,6 +762,9 @@ static int parse_insn_line(struct todo_item *item, 
> > const char *bol, char *eol)
> > status = get_sha1(bol, commit_sha1);
> > *end_of_object_name = saved;
> >  
> > +   item->arg = end_of_object_name + strspn(end_of_object_name, " \t");
> > +   item->arg_len = (int)(eol - item->arg);
> > +
> 
> Does it work correctly for line without , that is
> 
>
> 
> I think it does, but I not entirely sure.

It does work correctly: in the example, *end_of_object_name would be '\n',
and strspn(end_of_object_name, " \t") would return 0.

Thanks for the review!
Dscho

Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-08-31 Thread Junio C Hamano
Jakub Narębski  writes:

 @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
 *opts)
  struct todo_item {
enum todo_command command;
struct commit *commit;
 +  const char *arg;
 +  int arg_len;
>  
>> I am not sure what the "commit" field of type "struct commit *" is
>> for.  It is not needed until it is the commit's turn to be picked or
>> reverted; if we end up stopping in the middle, parsing the commit
>> object for later steps will end up being wasted effort.
>
> From what I understand this was what sequencer did before this
> series, so it is not a regression (I think; the commit parsing
> was in different function, but I think at the same place in
> the callchain).

Yes, I agree with you and I didn't mean "this is a new bug" at all.
It just is an indication that further refactoring after this step is
needed, and it is likely to involve removal or modification of this
field.



Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-08-31 Thread Jakub Narębski
W dniu 31.08.2016 o 21:10, Junio C Hamano pisze:
> Jakub Narębski  writes:
> 
>>> diff --git a/sequencer.c b/sequencer.c
>>> index 06759d4..3398774 100644
>>> --- a/sequencer.c
>>> +++ b/sequencer.c
>>> @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
>>> *opts)
>>>  struct todo_item {
>>> enum todo_command command;
>>> struct commit *commit;
>>> +   const char *arg;
>>> +   int arg_len;
 
> I am not sure what the "commit" field of type "struct commit *" is
> for.  It is not needed until it is the commit's turn to be picked or
> reverted; if we end up stopping in the middle, parsing the commit
> object for later steps will end up being wasted effort.

>From what I understand this was what sequencer did before this
series, so it is not a regression (I think; the commit parsing
was in different function, but I think at the same place in
the callchain).

> 
> Also, when the sequencer becomes one sequencer to rule them all, the
> command set may contain something that does not even mention any
> commit at all (think: exec).

The "exec" line is a bit of exception, all other rebase -i commands
take commit as parameter.  It could always use NULL.

>
> So I am not sure if we want a parsed commit there (I would not
> object if we kept the texual object name read from the file,
> though).  The "one sequencer to rule them all" may even have to say
> "now give name ':1' to the result of the previous operation" in one
> step and in another later step have an instruction "merge ':1'".
> When that happens, you cannot even pre-populate the commit object
> when the sequencer reads the file, as the commit has not yet been
> created at that point.

True, --preserve-merges rebase is well, different.

Best,
-- 
Jakub Narebski



Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-08-31 Thread Junio C Hamano
Jakub Narębski  writes:

>> diff --git a/sequencer.c b/sequencer.c
>> index 06759d4..3398774 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
>> *opts)
>>  struct todo_item {
>>  enum todo_command command;
>>  struct commit *commit;
>> +const char *arg;
>> +int arg_len;
>
> Why 'arg', and not 'oneline', or 'subject'?
> I'm not saying it is bad name.

I am not sure what the "commit" field of type "struct commit *" is
for.  It is not needed until it is the commit's turn to be picked or
reverted; if we end up stopping in the middle, parsing the commit
object for later steps will end up being wasted effort.

Also, when the sequencer becomes one sequencer to rule them all, the
command set may contain something that does not even mention any
commit at all (think: exec).

So I am not sure if we want a parsed commit there (I would not
object if we kept the texual object name read from the file,
though).  The "one sequencer to rule them all" may even have to say
"now give name ':1' to the result of the previous operation" in one
step and in another later step have an instruction "merge ':1'".
When that happens, you cannot even pre-populate the commit object
when the sequencer reads the file, as the commit has not yet been
created at that point.




Re: [PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-08-31 Thread Jakub Narębski
W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:

> The `git-rebase-todo` file contains a list of commands. Most of those
> commands have the form
> 
> 
> 
> The  is displayed primarily for the user's convenience, as
> rebase -i really interprets only the   part. However, there
> are *some* places in interactive rebase where the  is used to
> display messages, e.g. for reporting at which commit we stopped.
> 
> So let's just remember it when parsing the todo file; we keep a copy of
> the entire todo file anyway (to write out the new `done` and
> `git-rebase-todo` file just before processing each command), so all we
> need to do is remember the begin and end offsets.

Actually what we remember is pointer and length, or begin offset and length,
not offset and offset.

> 
> Signed-off-by: Johannes Schindelin 

Nice, I'll see how it is used later (and in which commit in series).

> ---
>  sequencer.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 06759d4..3398774 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts 
> *opts)
>  struct todo_item {
>   enum todo_command command;
>   struct commit *commit;
> + const char *arg;
> + int arg_len;

Why 'arg', and not 'oneline', or 'subject'?
I'm not saying it is bad name.

>   size_t offset_in_buf;
>  };
>  
> @@ -760,6 +762,9 @@ static int parse_insn_line(struct todo_item *item, const 
> char *bol, char *eol)
>   status = get_sha1(bol, commit_sha1);
>   *end_of_object_name = saved;
>  
> + item->arg = end_of_object_name + strspn(end_of_object_name, " \t");
> + item->arg_len = (int)(eol - item->arg);
> +

Does it work correctly for line without , that is

 

I think it does, but I not entirely sure.

>   if (status < 0)
>   return -1;
>  
> @@ -880,6 +885,8 @@ static int walk_revs_populate_todo(struct todo_list 
> *todo_list,
>  
>   item->command = command;
>   item->commit = commit;
> + item->arg = NULL;
> + item->arg_len = 0;
>   item->offset_in_buf = todo_list->buf.len;
>   subject_len = find_commit_subject(commit_buffer, &subject);
>   strbuf_addf(&todo_list->buf, "%s %s %.*s\n",
> 



[PATCH 13/22] sequencer: remember the onelines when parsing the todo file

2016-08-29 Thread Johannes Schindelin
The `git-rebase-todo` file contains a list of commands. Most of those
commands have the form

  

The  is displayed primarily for the user's convenience, as
rebase -i really interprets only the   part. However, there
are *some* places in interactive rebase where the  is used to
display messages, e.g. for reporting at which commit we stopped.

So let's just remember it when parsing the todo file; we keep a copy of
the entire todo file anyway (to write out the new `done` and
`git-rebase-todo` file just before processing each command), so all we
need to do is remember the begin and end offsets.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 06759d4..3398774 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -709,6 +709,8 @@ static int read_and_refresh_cache(struct replay_opts *opts)
 struct todo_item {
enum todo_command command;
struct commit *commit;
+   const char *arg;
+   int arg_len;
size_t offset_in_buf;
 };
 
@@ -760,6 +762,9 @@ static int parse_insn_line(struct todo_item *item, const 
char *bol, char *eol)
status = get_sha1(bol, commit_sha1);
*end_of_object_name = saved;
 
+   item->arg = end_of_object_name + strspn(end_of_object_name, " \t");
+   item->arg_len = (int)(eol - item->arg);
+
if (status < 0)
return -1;
 
@@ -880,6 +885,8 @@ static int walk_revs_populate_todo(struct todo_list 
*todo_list,
 
item->command = command;
item->commit = commit;
+   item->arg = NULL;
+   item->arg_len = 0;
item->offset_in_buf = todo_list->buf.len;
subject_len = find_commit_subject(commit_buffer, &subject);
strbuf_addf(&todo_list->buf, "%s %s %.*s\n",
-- 
2.10.0.rc1.114.g2bd6b38