Re: [RFC PATCH] sequencer: fix quoting in write_author_script
On 27/07/18 13:37, Johannes Schindelin wrote: > Hi Phillip, Junio and Akinori, > > I just noticed that t3404 is broken without my patches (but with Junio's > fixup), on Windows, macOS and Linux. (See log at the end.) > > On Fri, 27 Jul 2018, Phillip Wood wrote: > >> On 26/07/18 13:33, Johannes Schindelin wrote: >>> >>> On Wed, 18 Jul 2018, Phillip Wood wrote: >>> Single quotes should be escaped as \' not \\'. Note that this only affects authors that contain a single quote and then only external scripts that read the author script and users whose git is upgraded from the shell version of rebase -i while rebase was stopped. This is because the parsing in read_env_script() expected the broken version and for some reason sq_dequote() called by read_author_ident() seems to handle the broken quoting correctly. Ideally write_author_script() would be rewritten to use split_ident_line() and sq_quote_buf() but this commit just fixes the immediate bug. >>> This is untested, unfortuantely I don't have really have time to write a test or follow this up at the moment, if someone else want to run with it then please do. >>> >>> I modified the test that was added by Akinori. As it was added very early, >>> and as there is still a test case *after* Akinori's that compares a >>> hard-coded SHA-1, I refrained from using `test_commit` (which would change >>> that SHA-1). See below. >> >> Thanks for adding a test, that sounds like sensible approach, however >> having thought about it I wonder if we should just be writing a plain >> text file (e.g rebase-merge/author-data) and fixing the reader to read >> that if it exists and only then fall back to reading the legacy >> rebase-merge/author-script with a fix to correctly handle the script >> written by the shell version - what do you think? The author-script >> really should be just an implementation detail. If anyone really wants >> to read it they can still do 'read -r l' and split the lines with >> ${l%%=*} and ${l#*=} > > In contrast to `git am`, there *is* a use case where power users might > have come to rely on the presence of the .git/rebase-merge/author-script > file *and* its nature as a shell script snippet: we purposefully allow > scripting `rebase -i`. > > So I don't think that we can declare the file and its format as > implementation detail, even if the idea is very, very tempting. > diff --git a/sequencer.c b/sequencer.c index 5354d4d51e..0b78d1f100 100644 --- a/sequencer.c +++ b/sequencer.c @@ -638,21 +638,21 @@ static int write_author_script(const char *message) else if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); while (*message && *message != '\n' && *message != '\r') if (skip_prefix(message, "> ", )) break; else if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); while (*message && *message != '\n' && *message != '\r') if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); >>> >>> I resolved the merge conflict with Akinori's patch. FWIW I pushed all of >>> this, including the fixup to Junio's fixup to the >>> `fix-t3404-author-script-test` branch at https://github.com/dscho/git. >>> strbuf_release(); return res; @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) { struct strbuf script = STRBUF_INIT; int i, count = 0; - char *p, *p2; + const char *p2; + char *p; if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) return -1; for (p = script.buf; *p; p++) - if (skip_prefix(p, "'''", (const char **))) + /* + * write_author_script() used to escape "'" incorrectly as + * "'''" rather than "'\\''" so we check for the correct + * version the incorrect version in case git was upgraded while + * rebase was stopped. + */ + if (skip_prefix(p, "'\\''", ) || + skip_prefix(p, "'''", )) >>> >>> I think in this form, it is possibly unsafe because it assumes that the >>> new code cannot
Re: [RFC PATCH] sequencer: fix quoting in write_author_script
Hi Phillip, Junio and Akinori, I just noticed that t3404 is broken without my patches (but with Junio's fixup), on Windows, macOS and Linux. (See log at the end.) On Fri, 27 Jul 2018, Phillip Wood wrote: > On 26/07/18 13:33, Johannes Schindelin wrote: > > > > On Wed, 18 Jul 2018, Phillip Wood wrote: > > > >> Single quotes should be escaped as \' not \\'. Note that this only > >> affects authors that contain a single quote and then only external > >> scripts that read the author script and users whose git is upgraded from > >> the shell version of rebase -i while rebase was stopped. This is because > >> the parsing in read_env_script() expected the broken version and for > >> some reason sq_dequote() called by read_author_ident() seems to handle > >> the broken quoting correctly. > >> > >> Ideally write_author_script() would be rewritten to use > >> split_ident_line() and sq_quote_buf() but this commit just fixes the > >> immediate bug. > > > >> This is untested, unfortuantely I don't have really have time to write a > >> test or > >> follow this up at the moment, if someone else want to run with it then > >> please > >> do. > > > > I modified the test that was added by Akinori. As it was added very early, > > and as there is still a test case *after* Akinori's that compares a > > hard-coded SHA-1, I refrained from using `test_commit` (which would change > > that SHA-1). See below. > > Thanks for adding a test, that sounds like sensible approach, however > having thought about it I wonder if we should just be writing a plain > text file (e.g rebase-merge/author-data) and fixing the reader to read > that if it exists and only then fall back to reading the legacy > rebase-merge/author-script with a fix to correctly handle the script > written by the shell version - what do you think? The author-script > really should be just an implementation detail. If anyone really wants > to read it they can still do 'read -r l' and split the lines with > ${l%%=*} and ${l#*=} In contrast to `git am`, there *is* a use case where power users might have come to rely on the presence of the .git/rebase-merge/author-script file *and* its nature as a shell script snippet: we purposefully allow scripting `rebase -i`. So I don't think that we can declare the file and its format as implementation detail, even if the idea is very, very tempting. > >> diff --git a/sequencer.c b/sequencer.c > >> index 5354d4d51e..0b78d1f100 100644 > >> --- a/sequencer.c > >> +++ b/sequencer.c > >> @@ -638,21 +638,21 @@ static int write_author_script(const char *message) > >>else if (*message != '\'') > >>strbuf_addch(, *(message++)); > >>else > >> - strbuf_addf(, "'%c'", *(message++)); > >> + strbuf_addf(, "'\\%c'", *(message++)); > >>strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); > >>while (*message && *message != '\n' && *message != '\r') > >>if (skip_prefix(message, "> ", )) > >>break; > >>else if (*message != '\'') > >>strbuf_addch(, *(message++)); > >>else > >> - strbuf_addf(, "'%c'", *(message++)); > >> + strbuf_addf(, "'\\%c'", *(message++)); > >>strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); > >>while (*message && *message != '\n' && *message != '\r') > >>if (*message != '\'') > >>strbuf_addch(, *(message++)); > >>else > >> - strbuf_addf(, "'%c'", *(message++)); > >> + strbuf_addf(, "'\\%c'", *(message++)); > >>res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); > > > > I resolved the merge conflict with Akinori's patch. FWIW I pushed all of > > this, including the fixup to Junio's fixup to the > > `fix-t3404-author-script-test` branch at https://github.com/dscho/git. > > > >>strbuf_release(); > >>return res; > >> @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) > >> { > >>struct strbuf script = STRBUF_INIT; > >>int i, count = 0; > >> - char *p, *p2; > >> + const char *p2; > >> + char *p; > >> > >>if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) > >>return -1; > >> > >>for (p = script.buf; *p; p++) > >> - if (skip_prefix(p, "'''", (const char **))) > >> + /* > >> + * write_author_script() used to escape "'" incorrectly as > >> + * "'''" rather than "'\\''" so we check for the correct > >> + * version the incorrect version in case git was upgraded while > >> + * rebase was stopped. > >> + */ > >> + if (skip_prefix(p, "'\\''", ) || > >> + skip_prefix(p, "'''", )) > > > > I think in this form, it is possibly unsafe because it assumes that the > > new code cannot generate output that would trigger that same code path. > > Although I have to admit
Re: [RFC PATCH] sequencer: fix quoting in write_author_script
Hi Johannes On 26/07/18 13:33, Johannes Schindelin wrote: > Hi Phillip, > > On Wed, 18 Jul 2018, Phillip Wood wrote: > >> From: Phillip Wood >> >> Single quotes should be escaped as \' not \\'. Note that this only >> affects authors that contain a single quote and then only external >> scripts that read the author script and users whose git is upgraded from >> the shell version of rebase -i while rebase was stopped. This is because >> the parsing in read_env_script() expected the broken version and for >> some reason sq_dequote() called by read_author_ident() seems to handle >> the broken quoting correctly. >> >> Ideally write_author_script() would be rewritten to use >> split_ident_line() and sq_quote_buf() but this commit just fixes the >> immediate bug. >> >> Signed-off-by: Phillip Wood >> --- > > Good catch. > >> This is untested, unfortuantely I don't have really have time to write a >> test or >> follow this up at the moment, if someone else want to run with it then please >> do. > > I modified the test that was added by Akinori. As it was added very early, > and as there is still a test case *after* Akinori's that compares a > hard-coded SHA-1, I refrained from using `test_commit` (which would change > that SHA-1). See below. Thanks for adding a test, that sounds like sensible approach, however having thought about it I wonder if we should just be writing a plain text file (e.g rebase-merge/author-data) and fixing the reader to read that if it exists and only then fall back to reading the legacy rebase-merge/author-script with a fix to correctly handle the script written by the shell version - what do you think? The author-script really should be just an implementation detail. If anyone really wants to read it they can still do 'read -r l' and split the lines with ${l%%=*} and ${l#*=} >> diff --git a/sequencer.c b/sequencer.c >> index 5354d4d51e..0b78d1f100 100644 >> --- a/sequencer.c >> +++ b/sequencer.c >> @@ -638,21 +638,21 @@ static int write_author_script(const char *message) >> else if (*message != '\'') >> strbuf_addch(, *(message++)); >> else >> -strbuf_addf(, "'%c'", *(message++)); >> +strbuf_addf(, "'\\%c'", *(message++)); >> strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); >> while (*message && *message != '\n' && *message != '\r') >> if (skip_prefix(message, "> ", )) >> break; >> else if (*message != '\'') >> strbuf_addch(, *(message++)); >> else >> -strbuf_addf(, "'%c'", *(message++)); >> +strbuf_addf(, "'\\%c'", *(message++)); >> strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); >> while (*message && *message != '\n' && *message != '\r') >> if (*message != '\'') >> strbuf_addch(, *(message++)); >> else >> -strbuf_addf(, "'%c'", *(message++)); >> +strbuf_addf(, "'\\%c'", *(message++)); >> res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); > > I resolved the merge conflict with Akinori's patch. FWIW I pushed all of > this, including the fixup to Junio's fixup to the > `fix-t3404-author-script-test` branch at https://github.com/dscho/git. > >> strbuf_release(); >> return res; >> @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) >> { >> struct strbuf script = STRBUF_INIT; >> int i, count = 0; >> -char *p, *p2; >> +const char *p2; >> +char *p; >> >> if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) >> return -1; >> >> for (p = script.buf; *p; p++) >> -if (skip_prefix(p, "'''", (const char **))) >> +/* >> + * write_author_script() used to escape "'" incorrectly as >> + * "'''" rather than "'\\''" so we check for the correct >> + * version the incorrect version in case git was upgraded while >> + * rebase was stopped. >> + */ >> +if (skip_prefix(p, "'\\''", ) || >> +skip_prefix(p, "'''", )) > > I think in this form, it is possibly unsafe because it assumes that the > new code cannot generate output that would trigger that same code path. > Although I have to admit that I did not give this a great deal of thought. Hm, I not sure that it can. If the Author begins \\' then this will be written as the C string "''\\''...". If \\' comes at the end then I think this will be written as "'\\'''", in the middle of the name it will be "'\\''..." > In any case, if you have to think long and hard about some fix, it might > be better to go with something that is easier to reason about. So how > about this: we already know that the code is buggy, Akinori fixed the bug, > where the author-script missed its trailing single-quote. We
Re: [RFC PATCH] sequencer: fix quoting in write_author_script
Hi Phillip, On Wed, 18 Jul 2018, Phillip Wood wrote: > From: Phillip Wood > > Single quotes should be escaped as \' not \\'. Note that this only > affects authors that contain a single quote and then only external > scripts that read the author script and users whose git is upgraded from > the shell version of rebase -i while rebase was stopped. This is because > the parsing in read_env_script() expected the broken version and for > some reason sq_dequote() called by read_author_ident() seems to handle > the broken quoting correctly. > > Ideally write_author_script() would be rewritten to use > split_ident_line() and sq_quote_buf() but this commit just fixes the > immediate bug. > > Signed-off-by: Phillip Wood > --- Good catch. > This is untested, unfortuantely I don't have really have time to write a test > or > follow this up at the moment, if someone else want to run with it then please > do. I modified the test that was added by Akinori. As it was added very early, and as there is still a test case *after* Akinori's that compares a hard-coded SHA-1, I refrained from using `test_commit` (which would change that SHA-1). See below. > diff --git a/sequencer.c b/sequencer.c > index 5354d4d51e..0b78d1f100 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -638,21 +638,21 @@ static int write_author_script(const char *message) > else if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); > while (*message && *message != '\n' && *message != '\r') > if (skip_prefix(message, "> ", )) > break; > else if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); > while (*message && *message != '\n' && *message != '\r') > if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); I resolved the merge conflict with Akinori's patch. FWIW I pushed all of this, including the fixup to Junio's fixup to the `fix-t3404-author-script-test` branch at https://github.com/dscho/git. > strbuf_release(); > return res; > @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) > { > struct strbuf script = STRBUF_INIT; > int i, count = 0; > - char *p, *p2; > + const char *p2; > + char *p; > > if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) > return -1; > > for (p = script.buf; *p; p++) > - if (skip_prefix(p, "'''", (const char **))) > + /* > + * write_author_script() used to escape "'" incorrectly as > + * "'''" rather than "'\\''" so we check for the correct > + * version the incorrect version in case git was upgraded while > + * rebase was stopped. > + */ > + if (skip_prefix(p, "'\\''", ) || > + skip_prefix(p, "'''", )) I think in this form, it is possibly unsafe because it assumes that the new code cannot generate output that would trigger that same code path. Although I have to admit that I did not give this a great deal of thought. In any case, if you have to think long and hard about some fix, it might be better to go with something that is easier to reason about. So how about this: we already know that the code is buggy, Akinori fixed the bug, where the author-script missed its trailing single-quote. We can use this as a tell-tale for *this* bug. Assuming that Junio will advance both your and Akinori's fix in close proximity. Again, this is pushed to the `fix-t3404-author-script-test` branch at https://github.com/dscho/git; My fixup on top of your patch looks like this (feel free to drop the sq_bug part and only keep the test part): -- snipsnap -- diff --git a/sequencer.c b/sequencer.c index 46c0b3e720f..7abe78dc78e 100644 --- a/sequencer.c +++ b/sequencer.c @@ -573,13 +573,14 @@ static int write_author_script(const char *message) static int read_env_script(struct argv_array *env) { struct strbuf script = STRBUF_INIT; - int i, count = 0; + int i, count = 0, sq_bug; const char *p2; char *p; if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) return -1; + sq_bug = script.len && script.buf[script.len - 1] != '\''; for (p = script.buf; *p; p++)
Re: [RFC PATCH] sequencer: fix quoting in write_author_script
Phillip Wood writes: > From: Phillip Wood > > Single quotes should be escaped as \' not \\'. Note that this only > affects authors that contain a single quote and then only external > scripts that read the author script and users whose git is upgraded from > the shell version of rebase -i while rebase was stopped. This is because > the parsing in read_env_script() expected the broken version and for > some reason sq_dequote() called by read_author_ident() seems to handle > the broken quoting correctly. > > Ideally write_author_script() would be rewritten to use > split_ident_line() and sq_quote_buf() but this commit just fixes the > immediate bug. > > Signed-off-by: Phillip Wood > --- > > This is untested, unfortuantely I don't have really have time to write a test > or > follow this up at the moment, if someone else want to run with it then please > do. Any follow-up on this? Dscho, I do agree with Phillip that the author-script this code produces does not quote single quotes correctly (if we consider that the author-script is meant to be compatible with shell script version and what "git am" uses, that is), and Phillip's fix looks like a reasonable first step going forward (the next step would remove the transitional band-aid from the reading code, also added by this patch). Thanks. > > sequencer.c | 18 +- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/sequencer.c b/sequencer.c > index 5354d4d51e..0b78d1f100 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -638,21 +638,21 @@ static int write_author_script(const char *message) > else if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); > while (*message && *message != '\n' && *message != '\r') > if (skip_prefix(message, "> ", )) > break; > else if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); > while (*message && *message != '\n' && *message != '\r') > if (*message != '\'') > strbuf_addch(, *(message++)); > else > - strbuf_addf(, "'%c'", *(message++)); > + strbuf_addf(, "'\\%c'", *(message++)); > res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); > strbuf_release(); > return res; > @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) > { > struct strbuf script = STRBUF_INIT; > int i, count = 0; > - char *p, *p2; > + const char *p2; > + char *p; > > if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) > return -1; > > for (p = script.buf; *p; p++) > - if (skip_prefix(p, "'''", (const char **))) > + /* > + * write_author_script() used to escape "'" incorrectly as > + * "'''" rather than "'\\''" so we check for the correct > + * version the incorrect version in case git was upgraded while > + * rebase was stopped. > + */ > + if (skip_prefix(p, "'\\''", ) || > + skip_prefix(p, "'''", )) > strbuf_splice(, p - script.buf, p2 - p, "'", 1); > else if (*p == '\'') > strbuf_splice(, p-- - script.buf, 1, "", 0);
[RFC PATCH] sequencer: fix quoting in write_author_script
From: Phillip Wood Single quotes should be escaped as \' not \\'. Note that this only affects authors that contain a single quote and then only external scripts that read the author script and users whose git is upgraded from the shell version of rebase -i while rebase was stopped. This is because the parsing in read_env_script() expected the broken version and for some reason sq_dequote() called by read_author_ident() seems to handle the broken quoting correctly. Ideally write_author_script() would be rewritten to use split_ident_line() and sq_quote_buf() but this commit just fixes the immediate bug. Signed-off-by: Phillip Wood --- This is untested, unfortuantely I don't have really have time to write a test or follow this up at the moment, if someone else want to run with it then please do. sequencer.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/sequencer.c b/sequencer.c index 5354d4d51e..0b78d1f100 100644 --- a/sequencer.c +++ b/sequencer.c @@ -638,21 +638,21 @@ static int write_author_script(const char *message) else if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); strbuf_addstr(, "'\nGIT_AUTHOR_EMAIL='"); while (*message && *message != '\n' && *message != '\r') if (skip_prefix(message, "> ", )) break; else if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); strbuf_addstr(, "'\nGIT_AUTHOR_DATE='@"); while (*message && *message != '\n' && *message != '\r') if (*message != '\'') strbuf_addch(, *(message++)); else - strbuf_addf(, "'%c'", *(message++)); + strbuf_addf(, "'\\%c'", *(message++)); res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1); strbuf_release(); return res; @@ -666,13 +666,21 @@ static int read_env_script(struct argv_array *env) { struct strbuf script = STRBUF_INIT; int i, count = 0; - char *p, *p2; + const char *p2; + char *p; if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) return -1; for (p = script.buf; *p; p++) - if (skip_prefix(p, "'''", (const char **))) + /* +* write_author_script() used to escape "'" incorrectly as +* "'''" rather than "'\\''" so we check for the correct +* version the incorrect version in case git was upgraded while +* rebase was stopped. +*/ + if (skip_prefix(p, "'\\''", ) || + skip_prefix(p, "'''", )) strbuf_splice(, p - script.buf, p2 - p, "'", 1); else if (*p == '\'') strbuf_splice(, p-- - script.buf, 1, "", 0); -- 2.18.0