Re: [PATCH 2/6] sequencer: learn about the special "fake root commit" handling

2018-04-29 Thread Stefan Beller
Hi Johannes,

thanks for taking your time to explain things. It shows I am not
familiar with the rebase code, yet.

>
> Having said that, *this* time round, what we need to do is actually very
> similar to what builtin/am.c's read_author_script() does (even if we
> cannot use it as-is: it populates part of a `struct am_state`). I'll have
> to look into this more closely.

Heh, so my rambling was worth it. Thanks for looking into that.


>> This part could be prefixed with
>> /* un-escape text: turn \\ into ' and remove single quotes. */
>
> If could be prefixed that way, but it would be incorrect. We never turn \\
> into '. What we do here (and I do not want to repeat in a comment what the
> code does): we dequote what we previously quoted using single quotes. So
> we use the fact that we know that the value is of the form 'abc', or if it
> contains single quotes: 'this has '\''single'\'' quotes'.

Is there a helper 'dequote' somewhere?

>> > +/* Does this command create a (non-merge) commit? */
>> > +static int is_pick_or_similar(enum todo_command command)
>> > +{
>> > +   return command <= TODO_SQUASH;
>> > +}
>>
>> This code looks scary.
[...]

> The code already does things like that, by testing `command <
> TODO_COMMENT`.

ok great. So that is a widely used pattern for enum todo_command,
such that rearranging their order would break a lot. (Hence people will
find out quickly when doing so).

> But I guess your concerns would go away if I made this a big honking
> switch() statement that lists *explicitly* what should be considered "pick
> or similar"?

I did not spell that out as producing lots of lines of code is not the
primary goal, but understandability is.

And it looked like it would just pick the first section, so I thought about
some different approaches, either by making the enum todo_command
a lot more complex than an enum (an array of structs?) or adding
a new parallel array, that contains additional information for each
value at the given index, e.g.

static int is_pick_or_similar(enum todo_command command)
{
return todo_command_flags[value] & TODO_CMDS_IS_PICKING;
}

but all approaches I had were more complicated, such that the additional
verbosity would not be enough of a trade off.

I think this function was only scary as I was not familiar with the rebase
code as that employs similar patterns already.

>> > +   if (is_fixup(command))
>> > +   return error(_("cannot fixup root 
>> > commit"));
>>
>> I would expect you also cannot squash into root commit?
>
> In sequencer.c, `is_fixup()` really means "is it a fixup or a squash?". So
> yes, you are correct that we also cannot squash into a root commit.
>
> However, a squash is the same as a fixup with the only difference being
> that the squash lets you edit the final commit message (and does not
> comment out the squash commit's message, in contrast to fixup).
>
> Is it really worth adding an ugly line break in the middle of the error()
> statement just to say "cannot fixup/squash into root commit"? I'd rather
> not.

Makes sense,

Thanks for your patience,
Stefan


Re: [PATCH 2/6] sequencer: learn about the special "fake root commit" handling

2018-04-29 Thread Johannes Schindelin
Hi Stefan,

On Sat, 28 Apr 2018, Stefan Beller wrote:

> On Fri, Apr 27, 2018 at 3:31 PM, Johannes Schindelin
>  wrote:
> > When an interactive rebase wants to recreate a root commit, it
> > - first creates a new, empty root commit,
> > - checks it out,
> > - converts the next `pick` command so that it amends the empty root
> >   commit
> >
> > Introduce support in the sequencer to handle such an empty root commit,
> > by looking for the file /rebase-merge/squash-onto; if it exists
> > and contains a commit name, the sequencer will compare the HEAD to said
> > root commit, and if identical, a new root commit will be created.
> >
> > While converting scripted code into proper, portable C, we also do away
> > with the old "amend with an empty commit message, then cherry-pick
> > without committing, then amend again" dance and replace it with code
> > that uses the internal API properly to do exactly what we want: create a
> > new root commit.
> >
> > To keep the implementation simple, we always spawn `git commit` to create
> > new root commits.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  sequencer.c | 104 ++--
> >  sequencer.h |   4 ++
> >  2 files changed, 105 insertions(+), 3 deletions(-)
> >
> > diff --git a/sequencer.c b/sequencer.c
> > index 90c8218aa9a..fc124596b53 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -125,6 +125,12 @@ static GIT_PATH_FUNC(rebase_path_rewritten_list, 
> > "rebase-merge/rewritten-list")
> >  static GIT_PATH_FUNC(rebase_path_rewritten_pending,
> > "rebase-merge/rewritten-pending")
> >
> > +/*
> > + * The path of the file containig the OID of the "squash onto" commit, i.e.
> > + * the dummy commit used for `reset [new root]`.
> > + */
> > +static GIT_PATH_FUNC(rebase_path_squash_onto, "rebase-merge/squash-onto")
> > +
> >  /*
> >   * The path of the file listing refs that need to be deleted after the 
> > rebase
> >   * finishes. This is used by the `label` command to record the need for 
> > cleanup.
> > @@ -470,7 +476,8 @@ static int fast_forward_to(const struct object_id *to, 
> > const struct object_id *f
> > transaction = ref_transaction_begin(&err);
> > if (!transaction ||
> > ref_transaction_update(transaction, "HEAD",
> > -  to, unborn ? &null_oid : from,
> > +  to, unborn && !is_rebase_i(opts) ?
> > +  &null_oid : from,
> >0, sb.buf, &err) ||
> > ref_transaction_commit(transaction, &err)) {
> > ref_transaction_free(transaction);
> > @@ -692,6 +699,42 @@ static char *get_author(const char *message)
> > return NULL;
> >  }
> >
> > +static const char *read_author_ident(struct strbuf *buf)
> 
> This seems to be the counter part of write_author_script(*msg),
> would it make sense to either rename this to read_author_script
> or rename the counter part to write_author_ident ?

It is not really *the* counterpart of write_author_script(). There are
many conceivable counterparts, one of them already exists and is called
read_env_script(). They serve different purposes, even if both read the
author-script file, and they parse things in a fundamentally different
way. I had already pointed out something along those lines in the review
of the patch introducing the read_env_script() and Junio did not believe
me. To make sure, it was my fault that I failed to make a compelling
enough argument. I am glad that I now have proof that I was right: just
because you read the same file, for a similar purpose, does not
necessarily imply that you can share code for those purposes. All you can
share is the name of the input file.

Having said that, *this* time round, what we need to do is actually very
similar to what builtin/am.c's read_author_script() does (even if we
cannot use it as-is: it populates part of a `struct am_state`). I'll have
to look into this more closely.

> > +{
> > +   char *p, *p2;
> > +
> > +   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> 
> The 256 is a hint for read_file how to size the buffer initially.
> If not given it defaults to 8k, which presumably is too much for
> an author identity.

That is a correct reading of the code's intent.

> > +   for (p = buf->buf; *p; p++)
> > +   if (skip_prefix(p, "'''", (const char **)&p2))
> > +   strbuf_splice(buf, p - buf->buf, p2 - p, "'", 1);
> > +   else if (*p == '\'')
> > +   strbuf_splice(buf, p-- - buf->buf, 1, "", 0);
> 
> This part could be prefixed with
> /* un-escape text: turn \\ into ' and remove single quotes. */

If could be prefixed that way, but it would be incorrect. We never turn \\
into '. What we do here (and I do not want to repeat in a comment what the
code does): we dequote what we previously quoted using single quotes. So
we use

Re: [PATCH 2/6] sequencer: learn about the special "fake root commit" handling

2018-04-28 Thread Stefan Beller
On Fri, Apr 27, 2018 at 3:31 PM, Johannes Schindelin
 wrote:
> When an interactive rebase wants to recreate a root commit, it
> - first creates a new, empty root commit,
> - checks it out,
> - converts the next `pick` command so that it amends the empty root
>   commit
>
> Introduce support in the sequencer to handle such an empty root commit,
> by looking for the file /rebase-merge/squash-onto; if it exists
> and contains a commit name, the sequencer will compare the HEAD to said
> root commit, and if identical, a new root commit will be created.
>
> While converting scripted code into proper, portable C, we also do away
> with the old "amend with an empty commit message, then cherry-pick
> without committing, then amend again" dance and replace it with code
> that uses the internal API properly to do exactly what we want: create a
> new root commit.
>
> To keep the implementation simple, we always spawn `git commit` to create
> new root commits.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 104 ++--
>  sequencer.h |   4 ++
>  2 files changed, 105 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 90c8218aa9a..fc124596b53 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -125,6 +125,12 @@ static GIT_PATH_FUNC(rebase_path_rewritten_list, 
> "rebase-merge/rewritten-list")
>  static GIT_PATH_FUNC(rebase_path_rewritten_pending,
> "rebase-merge/rewritten-pending")
>
> +/*
> + * The path of the file containig the OID of the "squash onto" commit, i.e.
> + * the dummy commit used for `reset [new root]`.
> + */
> +static GIT_PATH_FUNC(rebase_path_squash_onto, "rebase-merge/squash-onto")
> +
>  /*
>   * The path of the file listing refs that need to be deleted after the rebase
>   * finishes. This is used by the `label` command to record the need for 
> cleanup.
> @@ -470,7 +476,8 @@ static int fast_forward_to(const struct object_id *to, 
> const struct object_id *f
> transaction = ref_transaction_begin(&err);
> if (!transaction ||
> ref_transaction_update(transaction, "HEAD",
> -  to, unborn ? &null_oid : from,
> +  to, unborn && !is_rebase_i(opts) ?
> +  &null_oid : from,
>0, sb.buf, &err) ||
> ref_transaction_commit(transaction, &err)) {
> ref_transaction_free(transaction);
> @@ -692,6 +699,42 @@ static char *get_author(const char *message)
> return NULL;
>  }
>
> +static const char *read_author_ident(struct strbuf *buf)

This seems to be the counter part of write_author_script(*msg),
would it make sense to either rename this to read_author_script
or rename the counter part to write_author_ident ?

> +{
> +   char *p, *p2;
> +
> +   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)

The 256 is a hint for read_file how to size the buffer initially.
If not given it defaults to 8k, which presumably is too much for
an author identity.



> +   for (p = buf->buf; *p; p++)
> +   if (skip_prefix(p, "'''", (const char **)&p2))
> +   strbuf_splice(buf, p - buf->buf, p2 - p, "'", 1);
> +   else if (*p == '\'')
> +   strbuf_splice(buf, p-- - buf->buf, 1, "", 0);

This part could be prefixed with
/* un-escape text: turn \\ into ' and remove single quotes. */

> +   if (skip_prefix(buf->buf, "GIT_AUTHOR_NAME=", (const char **)&p)) {
> +   strbuf_splice(buf, 0, p - buf->buf, "", 0);
> +   p = strchr(buf->buf, '\n');
> +   if (skip_prefix(p, "\nGIT_AUTHOR_EMAIL=", (const char 
> **)&p2)) {
> +   strbuf_splice(buf, p - buf->buf, p2 - p, " <", 2);
> +   p = strchr(p, '\n');
> +   if (skip_prefix(p, "\nGIT_AUTHOR_DATE=@",
> +   (const char **)&p2)) {
> +   strbuf_splice(buf, p - buf->buf, p2 - p,
> + "> ", 2);
> +   p = strchr(p, '\n');
> +   if (p) {
> +   strbuf_setlen(buf, p - buf->buf);
> +   return buf->buf;

So here we have read GIT_AUTHOR_NAME, _EMAIL
and _DATE in that order and converted it to its form
"name  date" in a single line.

It would be better to invert the conditions and keep
the indentation level lower by:

if (!skip_prefix(...))
goto warning_and_return;
strbuf_splice(...);
...

I wondered if we want to factor out the conversion of
"author string in commit form" to "author information
in script form" into their own functions, and keep the reading
writing out of them. But then again we only need them in
these use cases for now, and such a refactoring can happen
later if

[PATCH 2/6] sequencer: learn about the special "fake root commit" handling

2018-04-27 Thread Johannes Schindelin
When an interactive rebase wants to recreate a root commit, it
- first creates a new, empty root commit,
- checks it out,
- converts the next `pick` command so that it amends the empty root
  commit

Introduce support in the sequencer to handle such an empty root commit,
by looking for the file /rebase-merge/squash-onto; if it exists
and contains a commit name, the sequencer will compare the HEAD to said
root commit, and if identical, a new root commit will be created.

While converting scripted code into proper, portable C, we also do away
with the old "amend with an empty commit message, then cherry-pick
without committing, then amend again" dance and replace it with code
that uses the internal API properly to do exactly what we want: create a
new root commit.

To keep the implementation simple, we always spawn `git commit` to create
new root commits.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 104 ++--
 sequencer.h |   4 ++
 2 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 90c8218aa9a..fc124596b53 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -125,6 +125,12 @@ static GIT_PATH_FUNC(rebase_path_rewritten_list, 
"rebase-merge/rewritten-list")
 static GIT_PATH_FUNC(rebase_path_rewritten_pending,
"rebase-merge/rewritten-pending")
 
+/*
+ * The path of the file containig the OID of the "squash onto" commit, i.e.
+ * the dummy commit used for `reset [new root]`.
+ */
+static GIT_PATH_FUNC(rebase_path_squash_onto, "rebase-merge/squash-onto")
+
 /*
  * The path of the file listing refs that need to be deleted after the rebase
  * finishes. This is used by the `label` command to record the need for 
cleanup.
@@ -470,7 +476,8 @@ static int fast_forward_to(const struct object_id *to, 
const struct object_id *f
transaction = ref_transaction_begin(&err);
if (!transaction ||
ref_transaction_update(transaction, "HEAD",
-  to, unborn ? &null_oid : from,
+  to, unborn && !is_rebase_i(opts) ?
+  &null_oid : from,
   0, sb.buf, &err) ||
ref_transaction_commit(transaction, &err)) {
ref_transaction_free(transaction);
@@ -692,6 +699,42 @@ static char *get_author(const char *message)
return NULL;
 }
 
+static const char *read_author_ident(struct strbuf *buf)
+{
+   char *p, *p2;
+
+   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+   return NULL;
+
+   for (p = buf->buf; *p; p++)
+   if (skip_prefix(p, "'''", (const char **)&p2))
+   strbuf_splice(buf, p - buf->buf, p2 - p, "'", 1);
+   else if (*p == '\'')
+   strbuf_splice(buf, p-- - buf->buf, 1, "", 0);
+
+   if (skip_prefix(buf->buf, "GIT_AUTHOR_NAME=", (const char **)&p)) {
+   strbuf_splice(buf, 0, p - buf->buf, "", 0);
+   p = strchr(buf->buf, '\n');
+   if (skip_prefix(p, "\nGIT_AUTHOR_EMAIL=", (const char **)&p2)) {
+   strbuf_splice(buf, p - buf->buf, p2 - p, " <", 2);
+   p = strchr(p, '\n');
+   if (skip_prefix(p, "\nGIT_AUTHOR_DATE=@",
+   (const char **)&p2)) {
+   strbuf_splice(buf, p - buf->buf, p2 - p,
+ "> ", 2);
+   p = strchr(p, '\n');
+   if (p) {
+   strbuf_setlen(buf, p - buf->buf);
+   return buf->buf;
+   }
+   }
+   }
+   }
+
+   warning(_("could not parse '%s'"), rebase_path_author_script());
+   return NULL;
+}
+
 static const char staged_changes_advice[] =
 N_("you have staged changes in your working tree\n"
 "If these changes are meant to be squashed into the previous commit, run:\n"
@@ -711,6 +754,7 @@ N_("you have staged changes in your working tree\n"
 #define AMEND_MSG   (1<<2)
 #define CLEANUP_MSG (1<<3)
 #define VERIFY_MSG  (1<<4)
+#define CREATE_ROOT_COMMIT (1<<5)
 
 /*
  * If we are cherry-pick, and if the merge did not result in
@@ -730,6 +774,40 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
struct child_process cmd = CHILD_PROCESS_INIT;
const char *value;
 
+   if (flags & CREATE_ROOT_COMMIT) {
+   struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
+   const char *author = is_rebase_i(opts) ?
+   read_author_ident(&script) : NULL;
+   struct object_id root_commit, *cache_tree_oid;
+   int res = 0;
+
+   if (!defmsg)
+   BUG("root commit without message");
+
+   i