Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-09-09 Thread Johannes Schindelin
Hi Kuba,

On Fri, 2 Sep 2016, Jakub Narębski wrote:

> W dniu 01.09.2016 o 09:49, Johannes Schindelin pisze:
> > On Wed, 31 Aug 2016, Jakub Narębski wrote: 
> 
> >> Here todo_list uses growable array implementation of list.  Which is
> >> I guess better on current CPU architecture, with slow memory,
> >> limited-size caches, and adjacency prefetching.
> > 
> > That is not the reason that an array is used here. The array allows us
> > much more flexibility.
> 
> It would be nice if this reasoning (behind the change from linked list
> to growable array) was mentioned in appropriate commit message, and
> perhaps also in the cover letter for the series.  It is IMVHO quite
> important information (that you thought obvious).

Amended.

> >>> +struct todo_item *append_todo(struct todo_list *todo_list)
> >>
> >> Errr... I don't quite understand the name of this function.
> >> What are you appending here to the todo_list?
> > 
> > A new item.
> > 
> >> Compare string_list_append() and string_list_append_nodup(),
> >> where the second parameter is item to append.
> > 
> > Yes, that is correct. In the case of a todo_item, things are a lot more
> > complicated, though. Some of the values have to be determined tediously
> > (such as the offset and length of the oneline after the "pick "
> > command). I just put those values directly into the newly allocated item,
> > is all.
> 
> I would expect sth_append command to take a list (or other collection),
> an element, and return [modified] collection with the new element added.
> Such API would require temporary variable in caller and memcopy in the
> sth_append() function.
> 
> This is not it.  It creates a new element, expanding a list (a collection),
> and then expose this element.  Which spares us memcopy... on non-critical
> path.
> 
> I don't know how to name operation "grow list and return new element".
> But "append" it is not.

I renamed it to append_new_todo().

> >>> - end_of_object_name = bol + strcspn(bol, " \t\n");
> >>> + end_of_object_name = (char *) bol + strcspn(bol, " \t\n");
> >>
> >> Why is this cast needed?
> > 
> > Because bol is a "const char *" and we need to put "NUL" temporarily to
> > *end_of_object_name:
> 
> Would compiler complain without this const'ness-stripping cast?

Yes. I would not have added it otherwise.

Please note that this is only necessary because I changed the parameter
from "char *" to "const char *" (which was The Right Thing To Do).

> >>>   saved = *end_of_object_name;
> >>>   *end_of_object_name = '\0';
> >>>   status = get_sha1(bol, commit_sha1);
> >>>   *end_of_object_name = saved;
> > 
> > Technically, this would have made a fine excuse to teach get_sha1() a
> > mode where it expects a length parameter instead of relying on a
> > NUL-terminated string.
> > 
> > Practically, such fine excuses cost me months in this rebase--helper
> > project already, and I need to protect my time better.
> 
> Put it in TODO list (and perhaps add a TODO comment) ;-).

I am also a realist: I won't be able to do anything about this. If you
care enough, please go right to town.

Thanks again,
Dscho

Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-09-01 Thread Jakub Narębski
Hello Johannes,

W dniu 01.09.2016 o 09:49, Johannes Schindelin pisze:
> On Wed, 31 Aug 2016, Jakub Narębski wrote: 
>> W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:
[...]

>> Hmmm... commit_list is, as defined in commit.h, a linked list.
> 
> That is the most prominent reason why the rest is not a mindless
> conversion from commit_list to todo_list.
> 
> And we need todo_list as an array, because we need to be able to peek, or
> even move, backwards from the current command.
> 
>> Here todo_list uses growable array implementation of list.  Which
>> is I guess better on current CPU architecture, with slow memory,
>> limited-size caches, and adjacency prefetching.
> 
> That is not the reason that an array is used here. The array allows us
> much more flexibility.

It would be nice if this reasoning (behind the change from linked list
to growable array) was mentioned in appropriate commit message, and
perhaps also in the cover letter for the series.  It is IMVHO quite
important information (that you thought obvious).

> 
> One of the major performance improvements will come at the very end, for
> example: the reordering of the fixup!/squash! lines. And that would be a
> *major* pain to do if the todo_list were still a linked list.

Actually deletion from and insertion into single linked list are
not that hard, and O(1) after finding place, O(N) with finding
included.  Moving elements in array is O(N),... and arguably a bit
simpler - but at high level, with appropriate primitives, they are
about the same.

Yes, array is easier for permutation and reordering.

>>> +struct todo_item *append_todo(struct todo_list *todo_list)
>>
>> Errr... I don't quite understand the name of this function.
>> What are you appending here to the todo_list?
> 
> A new item.
> 
>> Compare string_list_append() and string_list_append_nodup(),
>> where the second parameter is item to append.
> 
> Yes, that is correct. In the case of a todo_item, things are a lot more
> complicated, though. Some of the values have to be determined tediously
> (such as the offset and length of the oneline after the "pick "
> command). I just put those values directly into the newly allocated item,
> is all.

I would expect sth_append command to take a list (or other collection),
an element, and return [modified] collection with the new element added.
Such API would require temporary variable in caller and memcopy in the
sth_append() function.

This is not it.  It creates a new element, expanding a list (a collection),
and then expose this element.  Which spares us memcopy... on non-critical
path.

I don't know how to name operation "grow list and return new element".
But "append" it is not.
 
>>> +   ALLOC_GROW(todo_list->items, todo_list->nr + 1, todo_list->alloc);
>>> +   return todo_list->items + todo_list->nr++;
>>>  }
>>>  
>>> -static struct commit *parse_insn_line(char *bol, char *eol, struct 
>>> replay_opts *opts)
>>> +static int parse_insn_line(struct todo_item *item, const char *bol, char 
>>> *eol)
>>
>> Why the change of return type?  
> 
> Because it makes no sense to return a commit here because not all commands
> are about commits (think rebase -i's `exec`). It makes tons of sense to
> return an error condition, though.

All right.

I have not checked / not remember what caller of parse_insn_line() used
it's return value for.  I guess that we save it into todo_item, moving
that operation from caller to callee.

>> Why now struct todo_item is first when struct replay_opts was last?
> 
> Those play very, very different roles.
> 
> The opts parameter used to provide parse_insn_line() with enough
> information to complain loudly when the overall command was not identical
> to the parsed command.
> 
> The item parameter is a receptacle for the parsed data. It will contain
> the pointer to the commit that was previously returned, if any. But it
> will also contain much more information, such as the command, the oneline,
> the offset in the buffer, etc etc
> 
> So "opts" was an "in" parameter while "item" is an "out" one. Apples and
> oranges.

All right.  Good explanation.  And the one that is too low-level
detail to put in the commit message, I think.

>>> -   end_of_object_name = bol + strcspn(bol, " \t\n");
>>> +   end_of_object_name = (char *) bol + strcspn(bol, " \t\n");
>>
>> Why is this cast needed?
> 
> Because bol is a "const char *" and we need to put "NUL" temporarily to
> *end_of_object_name:

Would compiler complain without this const'ness-stripping cast?

> 
>>> saved = *end_of_object_name;
>>> *end_of_object_name = '\0';
>>> status = get_sha1(bol, commit_sha1);
>>> *end_of_object_name = saved;
> 
> Technically, this would have made a fine excuse to teach get_sha1() a mode
> where it expects a length parameter instead of relying on a NUL-terminated
> string.
> 
> Practically, such fine excuses cost me months in this rebase--helper
> project already, and I need to protect my time better.

Put it in TODO list

Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-09-01 Thread Junio C Hamano
Johannes Schindelin  writes:

>> git continue as a shorthand for `git  --continue` sounds great.
>
> Before we get ahead of ourselves:
>
> 1) this has nothing to do with the patch series at hand, and
>
> 2) if we were to introduce `git continue`, we would need to think long and
>hard about the following issues:
>
>   I) are there potentially ambiguous s that the user
>  may want to continue?
>
>   II) what about options? You can say `git rebase --continue
>   --no-ff`, for example, but not `git cherry-pick --continue
>   --no-ff`...
>
>   III) Would it not be confusing to have a subcommand `continue`
>that does *not* serve a *single* purpose? It's kinda flying
>into the face of the Unix philosophy.

The above reasoning applies equally to "git abort".  I do not think
"git continue" would help.

If it were that anything you can do with Git can be --continue'ed
the same way (e.g. all uses one sequencer to rule them all), it
might be achievable, but I do not think it isn't, and will never be.

"git commit" may say "You haven't added anything yet" and refuse to
do anything.  Should "git continue" do "git commit -a" by noticing
that the last thing you tried to do was "git commit" and guess that
it is likely you wanted to commit all changes?  I think not.


Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-09-01 Thread Johannes Schindelin
Hi Kuba,

On Wed, 31 Aug 2016, Jakub Narębski wrote:

> W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:
> 
> > When we came up with the "sequencer" idea, we really wanted to have
> > kind of a plumbing equivalent of the interactive rebase. Hence the
> > choice of words: the "todo" script, a "pick", etc.
> > 
> > However, when it came time to implement the entire shebang, somehow this
> > idea got lost and the sequencer was used as working horse for
> > cherry-pick and revert instead. So as not to interfere with the
> > interactive rebase, it even uses a separate directory to store its
> > state.
> > 
> > Furthermore, it also is stupidly strict about the "todo" script it
> > accepts: while it parses commands in a way that was *designed* to be
> > similar to the interactive rebase, it then goes on to *error out* if the
> > commands disagree with the overall action (cherry-pick or revert).
> 
> Does this mean that after the change you would be able to continue
> "git revert" with "git cherry-pick --continue", and vice versa?  Or that
> it would be possible for git-cherry-pick to do reverts (e.g. with ^)?

I guess that I allow that now. Is it harmful? I dunno.

> > Let's just bite the bullet and rewrite the entire parser; the code now
> > becomes not only more elegant: it allows us to go on and teach the
> > sequencer how to parse *true* "todo" scripts as used by the interactive
> > rebase itself. In a way, the sequencer is about to grow up to do its
> > older brother's job. Better.
> 
> Sidenote: this is not your fault, but Git doesn't do a good job on
> changes which are mostly rewrites, trying to match stray '}' and the
> like in generated diff.  I wonder if existing diff heuristic options
> could help here.

I guess --patience would have helped. Or Michael's upcoming
diff-heuristics.

> > While at it, do not stop at the first problem, but list *all* of the
> > problems. This helps the user by allowing to address all issues in
> > one go rather than going back and forth until the todo list is valid.
> 
> That is also a good change, though I wonder how often users need
> to worry about this outside interactive rebase case.  If it is
> preparation for rebase -i, where instruction list is written by
> prone to errors human, it would be nice to have this information
> in the commit message.

Okay.

> > diff --git a/sequencer.c b/sequencer.c
> > index 982b6e9..cbdce6d 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -473,7 +473,26 @@ static int allow_empty(struct replay_opts *opts, 
> > struct commit *commit)
> > return 1;
> >  }
> >  
> > -static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
> > +enum todo_command {
> > +   TODO_PICK,
> > +   TODO_REVERT
> > +};
> 
> Do we have a naming convention for enums elements?  Or are we explicitly
> making enums and #defines interchangeable?  I wonder...
> 
> ...uh, I see we don't have naming convention, but all caps snake-case
> names dominate:
> 
>   $ git grep -A2 'enum .* {'
>   [...]
>   diff.h:enum color_diff {
>   diff.h- DIFF_RESET = 0,
>   diff.h- DIFF_CONTEXT = 1,
>   --
>   dir.c:enum path_treatment {
>   dir.c-  path_none = 0,
>   dir.c-  path_recurse,
>   --
> 
> Shouldn't we say 'TODO_PICK = 0' explicitly, though?

Sure.

> > +static const char *todo_command_strings[] = {
> > +   "pick",
> > +   "revert"
> > +};
> 
> It's a bit pity that we cannot use designated inits, and hanging comma,
> (from ISO C99 standard).  That is:
> 
>   +static const char *todo_command_strings[] = {
>   +   [TODO_PICK]   = "pick",
>   +   [TODO_REVERT] = "revert",
>   +};

I agree, it is a pity. I could do something like I did in fsck.c:

#define FOREACH_TODO_COMMAND(FUNC) \
FUNC(PICK, "pick") \
FUNC(REVERT, "revert")

#define COMMAND_ID(id, string) TODO_##id,
enum todo_command {
FOREACH_TODO_COMMAND(COMMAND_ID)
TODO_END
};
#undef COMMAND_ID

#define COMMAND_ID(id, string) string,
static const char *todo_command_string[] = {
FOREACH_TODO_COMMAND(COMMAND_ID)
NULL
};
#undef COMMAND_ID

However, this is not even readable, let alone any other type of an
improvement. So I won't.

> > @@ -548,7 +568,7 @@ static int do_pick_commit(struct commit *commit, struct 
> > replay_opts *opts)
> 
> From here on changes are about
> 
>   s/opts->action == REPLAY_\(PICK\|REVERT\)/command == TODO_\1/
> 
> Do we still need opts->action, or it is just needed less,
> and it is 'todo' instruction that decides about command
> (as it should)?

We need opts->action. For example, the state directory changes depending
on it: REPLAY_INTERACTIVE_REBASE stores its stuff in
git_path("rebase-merge").

There is lots more behavior that also changes depending on opts->action.

> > [...]
> > if (res) {
> > -   error(opts->action == REPLAY_REVERT
> > +   error(command == TODO_REVERT
> 

Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-08-31 Thread Johannes Schindelin
Hi Kuba and Stefan,

On Wed, 31 Aug 2016, Stefan Beller wrote:

> On Wed, Aug 31, 2016 at 10:29 AM, Jakub Narębski  wrote:
> 
> >
> > BTW. perhaps we would be able to continue with 'git continue', regardless
> > of what we have started with, I wonder...
> >
> 
> git continue as a shorthand for `git  --continue` sounds great.

Before we get ahead of ourselves:

1) this has nothing to do with the patch series at hand, and

2) if we were to introduce `git continue`, we would need to think long and
   hard about the following issues:

I) are there potentially ambiguous s that the user
   may want to continue?

II) what about options? You can say `git rebase --continue
--no-ff`, for example, but not `git cherry-pick --continue
--no-ff`...

III) Would it not be confusing to have a subcommand `continue`
 that does *not* serve a *single* purpose? It's kinda flying
 into the face of the Unix philosophy.

Ciao,
Dscho

Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-08-31 Thread Stefan Beller
On Wed, Aug 31, 2016 at 10:29 AM, Jakub Narębski  wrote:

>
> BTW. perhaps we would be able to continue with 'git continue', regardless
> of what we have started with, I wonder...
>

git continue as a shorthand for `git  --continue` sounds great.

If we were to introduce that, I think we would need to unify the rest
as well then, e.g.
Both revert as well as cherry-pick have --quit as well as --abort,
but rebase doesn't have --quit documented, but instead an additional
--skip  and --edit-todo

Would we pull these all up as a top level command? (That sounds not so
great to me)


Re: [PATCH 09/22] sequencer: completely revamp the "todo" script parsing

2016-08-31 Thread Jakub Narębski
W dniu 29.08.2016 o 10:05, Johannes Schindelin pisze:

> When we came up with the "sequencer" idea, we really wanted to have
> kind of a plumbing equivalent of the interactive rebase. Hence the
> choice of words: the "todo" script, a "pick", etc.
> 
> However, when it came time to implement the entire shebang, somehow this
> idea got lost and the sequencer was used as working horse for
> cherry-pick and revert instead. So as not to interfere with the
> interactive rebase, it even uses a separate directory to store its
> state.
> 
> Furthermore, it also is stupidly strict about the "todo" script it
> accepts: while it parses commands in a way that was *designed* to be
> similar to the interactive rebase, it then goes on to *error out* if the
> commands disagree with the overall action (cherry-pick or revert).

Does this mean that after the change you would be able to continue
"git revert" with "git cherry-pick --continue", and vice versa?  Or that
it would be possible for git-cherry-pick to do reverts (e.g. with ^)?

That's what we need to decide before becoming more lenient.
 
BTW. perhaps we would be able to continue with 'git continue', regardless
of what we have started with, I wonder...

>
> Finally, the sequencer code chose to deviate from the interactive rebase
> code insofar that it *reformats* the "todo" script instead of just
> writing the part of the parsed script that were not yet processed. This
> is not only unnecessary churn, but might well lose information that is
> valuable to the user (i.e. comments after the commands).

That's a very good change.

>
> Let's just bite the bullet and rewrite the entire parser; the code now
> becomes not only more elegant: it allows us to go on and teach the
> sequencer how to parse *true* "todo" scripts as used by the interactive
> rebase itself. In a way, the sequencer is about to grow up to do its
> older brother's job. Better.

Sidenote: this is not your fault, but Git doesn't do a good job on
changes which are mostly rewrites, trying to match stray '}' and the
like in generated diff.  I wonder if existing diff heuristic options
could help here.

> 
> While at it, do not stop at the first problem, but list *all* of the
> problems. This helps the user by allowing to address all issues in
> one go rather than going back and forth until the todo list is valid.

That is also a good change, though I wonder how often users need
to worry about this outside interactive rebase case.  If it is
preparation for rebase -i, where instruction list is written by
prone to errors human, it would be nice to have this information
in the commit message.

> 
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 241 
> +---
>  1 file changed, 134 insertions(+), 107 deletions(-)

Note: I have moved some lines of diff so that the change is more
readable to humans (but it results often in --++-++ chunk).

> 
> diff --git a/sequencer.c b/sequencer.c
> index 982b6e9..cbdce6d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -473,7 +473,26 @@ static int allow_empty(struct replay_opts *opts, struct 
> commit *commit)
>   return 1;
>  }
>  
> -static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
> +enum todo_command {
> + TODO_PICK,
> + TODO_REVERT
> +};

Do we have a naming convention for enums elements?  Or are we explicitly
making enums and #defines interchangeable?  I wonder...

...uh, I see we don't have naming convention, but all caps snake-case
names dominate:

  $ git grep -A2 'enum .* {'
  [...]
  diff.h:enum color_diff {
  diff.h- DIFF_RESET = 0,
  diff.h- DIFF_CONTEXT = 1,
  --
  dir.c:enum path_treatment {
  dir.c-  path_none = 0,
  dir.c-  path_recurse,
  --

Shouldn't we say 'TODO_PICK = 0' explicitly, though?

> +
> +static const char *todo_command_strings[] = {
> + "pick",
> + "revert"
> +};

It's a bit pity that we cannot use designated inits, and hanging comma,
(from ISO C99 standard).  That is:

  +static const char *todo_command_strings[] = {
  + [TODO_PICK]   = "pick",
  + [TODO_REVERT] = "revert",
  +};


> +
> +static const char *command_to_string(const enum todo_command command)
> +{
> + if (command < ARRAY_SIZE(todo_command_strings))
> + return todo_command_strings[command];
> + die("Unknown command: %d", command);
> +}
> +
> +
> +static int do_pick_commit(enum todo_command command, struct commit *commit,
> + struct replay_opts *opts)
>  {
>   unsigned char head[20];
>   struct commit *base, *next, *parent;
> @@ -535,7 +554,8 @@ static int do_pick_commit(struct commit *commit, struct 
> replay_opts *opts)
>   /* TRANSLATORS: The first %s will be "revert" or
>  "cherry-pick", the second %s a SHA1 */
>   return error(_("%s: cannot parse parent commit %s"),

I wonder if we should not change also the error message: it is no
longer about command, but about operation