Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script

2018-08-01 Thread Eric Sunshine
On Wed, Aug 1, 2018 at 11:50 AM Phillip Wood  wrote:
> On 31/07/18 22:39, Eric Sunshine wrote:
> > On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood  
> > wrote:
> >> +   /*
> >> +* write_author_script() used to fail to terminate the 
> >> GIT_AUTHOR_DATE
> >> +* line with a "'" and also escaped "'" incorrectly as "'''" 
> >> rather
> >> +* than "'\\''". We check for the terminating "'" on the last line 
> >> to
> >> +* see how "'" has been escaped in case git was upgraded while 
> >> rebase
> >> +* was stopped.
> >> +*/
> >> +   sq_bug = script.len && script.buf[script.len - 2] != '\'';
> >
> > This is a very "delicate" check, assuming that a hand-edited file
> > won't end with, say, an extra newline. I wonder if this level of
> > backward-compatibility is overkill for such an unlikely case.
>
> I think I'll get rid of the check and instead use a version number
> written to .git/rebase-merge/interactive to indicate if we need to fix
> the quoting (if there's no number then it needs fixing). We can
> increment the version number in the future if we ever need to implement
> other fallbacks to handle the case where git got upgraded while rebase
> was stopped. I'll send a patch tomorrow

Hmm, that approach is pretty heavyweight and would add a fair bit of
new code and complexity which itself could harbor bugs. When I
commented that the check was "delicate", I was (especially) referring
to the rigid "script[len-2]", not necessarily to the basic idea of the
check. So, you could keep the check (in spirit) but make it more
robust. Like this, for instance:

/* big comment explaining old buggy stuff */
static int broken_quoting(const char *s, size_t n)
{
const char *t = s + n;
while (t > s && *--t == '\n')
/* empty */;
if (t > s && *--t != '\'')
return 1;
return 0;
}

static int read_env_script(...)
{
...
sq_bug = broken_quoting(script.buf, script.len);
...
}

I would feel much more comfortable with a simple solution like this
than with one introducing new complexity associated with adding a
version number.


Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script

2018-08-01 Thread Phillip Wood

On 31/07/18 22:39, Eric Sunshine wrote:

On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood  wrote:

Single quotes should be escaped as \' not \\'. Note that this only
affects authors that contain a single quote and then only external
scripts that read the author script and users whose git is upgraded from
the shell version of rebase -i while rebase was stopped. This is because
the parsing in read_env_script() expected the broken version and for
some reason sq_dequote() called by read_author_ident() seems to handle
the broken quoting correctly.


Is the:

 ...for some reason sq_dequote() called by read_author_ident()
 seems to handle the broken quoting correctly.

bit outdated? We know now from patch 2/4 of my series[1] that
read_author_ident() wasn't handling it correctly at all. It was merely
ignoring the return value from sq_dequote() and using whatever broken
value came back from it.

[1]: 
https://public-inbox.org/git/20180731073331.40007-3-sunsh...@sunshineco.com/


Helped-by: Johannes Schindelin 
Signed-off-by: Phillip Wood 
---
diff --git a/sequencer.c b/sequencer.c
@@ -664,14 +664,25 @@ static int write_author_script(const char *message)
  static int read_env_script(struct argv_array *env)
  {
 if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
 return -1;


This is not a problem introduced by this patch, but since
strbuf_read_file() doesn't guarantee that memory hasn't been allocated
when it returns an error, this is leaking.


+   /*
+* write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
+* line with a "'" and also escaped "'" incorrectly as "'''" rather
+* than "'\\''". We check for the terminating "'" on the last line to
+* see how "'" has been escaped in case git was upgraded while rebase
+* was stopped.
+*/
+   sq_bug = script.len && script.buf[script.len - 2] != '\'';


I think you need to be checking 'script.len > 1', not just
'script.len', otherwise you might access memory outside the allocated
buffer.

This is a very "delicate" check, assuming that a hand-edited file
won't end with, say, an extra newline. I wonder if this level of
backward-compatibility is overkill for such an unlikely case.


I think I'll get rid of the check and instead use a version number 
written to .git/rebase-merge/interactive to indicate if we need to fix 
the quoting (if there's no number then it needs fixing). We can 
increment the version number in the future if we ever need to implement 
other fallbacks to handle the case where git got upgraded while rebase 
was stopped. I'll send a patch tomorrow


Best Wishes

Phillip




 for (p = script.buf; *p; p++)
-   if (skip_prefix(p, "'''", (const char **)))
+   if (sq_bug && skip_prefix(p, "'''", ))
+   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
+   else if (skip_prefix(p, "'\\''", ))
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
@@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
+test_expect_success 'rebase -i writes correct author-script' '
+   test_when_finished "test_might_fail git rebase --abort" &&
+   git checkout -b author-with-sq master &&
+   GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
+   set_fake_editor &&
+   FAKE_LINES="edit 1" git rebase -ki HEAD^ &&


Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?





Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script

2018-08-01 Thread Junio C Hamano
Phillip Wood  writes:

>> Is the:
>>
>>  ...for some reason sq_dequote() called by read_author_ident()
>>  seems to handle the broken quoting correctly.
>>
>> bit outdated? We know now from patch 2/4 of my series[1] that
>> read_author_ident() wasn't handling it correctly at all. It was merely
>> ignoring the return value from sq_dequote() and using whatever broken
>> value came back from it.
>
> Yes you're right, when I tested it...
>
> Thanks for your comments, I'll do a reroll

Thanks, both.  Sounds like we are quickly converging to the
resolution ;-)


Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script

2018-08-01 Thread Phillip Wood

Hi Eric

On 31/07/18 22:39, Eric Sunshine wrote:

On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood  wrote:

Single quotes should be escaped as \' not \\'. Note that this only
affects authors that contain a single quote and then only external
scripts that read the author script and users whose git is upgraded from
the shell version of rebase -i while rebase was stopped. This is because
the parsing in read_env_script() expected the broken version and for
some reason sq_dequote() called by read_author_ident() seems to handle
the broken quoting correctly.


Is the:

 ...for some reason sq_dequote() called by read_author_ident()
 seems to handle the broken quoting correctly.

bit outdated? We know now from patch 2/4 of my series[1] that
read_author_ident() wasn't handling it correctly at all. It was merely
ignoring the return value from sq_dequote() and using whatever broken
value came back from it.


Yes you're right, when I tested it before I must of had GIT_AUTHOR_NAME 
set to the name with the "'" in it when I ran the rebase because it 
appeared to work, but actually sj_dequote() was returning NULL and so 
commit_tree() just picked up the default author. I've just changed the 
test you added to


test_expect_success 'valid author header after --root swap' '
rebase_setup_and_clean author-header no-conflict-branch &&
set_fake_editor &&
	git commit --amend --author="Au ${SQ}thor " 
--no-edit &&

git cat-file commit HEAD | grep ^author >expected &&
FAKE_LINES="5 1" git rebase -i --root &&
git cat-file commit HEAD^ | grep ^author >actual &&
test_cmp expected actual
'

and it fails without the fixes to write_author_script().



[1]: 
https://public-inbox.org/git/20180731073331.40007-3-sunsh...@sunshineco.com/


Helped-by: Johannes Schindelin 
Signed-off-by: Phillip Wood 
---
diff --git a/sequencer.c b/sequencer.c
@@ -664,14 +664,25 @@ static int write_author_script(const char *message)
  static int read_env_script(struct argv_array *env)
  {
 if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
 return -1;


This is not a problem introduced by this patch, but since
strbuf_read_file() doesn't guarantee that memory hasn't been allocated
when it returns an error, this is leaking.


I can fix that


+   /*
+* write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
+* line with a "'" and also escaped "'" incorrectly as "'''" rather
+* than "'\\''". We check for the terminating "'" on the last line to
+* see how "'" has been escaped in case git was upgraded while rebase
+* was stopped.
+*/
+   sq_bug = script.len && script.buf[script.len - 2] != '\'';


I think you need to be checking 'script.len > 1', not just
'script.len', otherwise you might access memory outside the allocated
buffer.


Good catch, Johannes's original was checking script.buf[script.len - 1] 
which I corrected but forget to adjust the previous check.



This is a very "delicate" check, assuming that a hand-edited file
won't end with, say, an extra newline. I wonder if this level of
backward-compatibility is overkill for such an unlikely case.


Yes, it is a bit fragile. Originally the patch just unquoted the correct 
and incorrect quoting but Johannes was worried that might lead to errors 
and suggested this check. The check is aimed at people whose git gets 
upgraded while rebase is stopped for a conflict resolution or edit and 
so have the bad quoting in the author-script from the old version of git 
which started the rebase. Authors with "'" in the name are uncommon but 
not unheard of, I think when I checked there were about half a dozen in 
git's history. I'm not sure what to do for the best.



 for (p = script.buf; *p; p++)
-   if (skip_prefix(p, "'''", (const char **)))
+   if (sq_bug && skip_prefix(p, "'''", ))
+   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
+   else if (skip_prefix(p, "'\\''", ))
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
@@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
+test_expect_success 'rebase -i writes correct author-script' '
+   test_when_finished "test_might_fail git rebase --abort" &&
+   git checkout -b author-with-sq master &&
+   GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
+   set_fake_editor &&
+   FAKE_LINES="edit 1" git rebase -ki HEAD^ &&


Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?


-k is short for --keep-empty which is needed as the test creates an 
empty commit to check the author (I think that is to avoid changing the 
tree  - Johannes wrote that bit).


Thanks for your comments, I'll do a reroll

Phillip


Re: [PATCH v2 2/2] sequencer: fix quoting in write_author_script

2018-07-31 Thread Eric Sunshine
On Tue, Jul 31, 2018 at 7:15 AM Phillip Wood  wrote:
> Single quotes should be escaped as \' not \\'. Note that this only
> affects authors that contain a single quote and then only external
> scripts that read the author script and users whose git is upgraded from
> the shell version of rebase -i while rebase was stopped. This is because
> the parsing in read_env_script() expected the broken version and for
> some reason sq_dequote() called by read_author_ident() seems to handle
> the broken quoting correctly.

Is the:

...for some reason sq_dequote() called by read_author_ident()
seems to handle the broken quoting correctly.

bit outdated? We know now from patch 2/4 of my series[1] that
read_author_ident() wasn't handling it correctly at all. It was merely
ignoring the return value from sq_dequote() and using whatever broken
value came back from it.

[1]: 
https://public-inbox.org/git/20180731073331.40007-3-sunsh...@sunshineco.com/

> Helped-by: Johannes Schindelin 
> Signed-off-by: Phillip Wood 
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -664,14 +664,25 @@ static int write_author_script(const char *message)
>  static int read_env_script(struct argv_array *env)
>  {
> if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
> return -1;

This is not a problem introduced by this patch, but since
strbuf_read_file() doesn't guarantee that memory hasn't been allocated
when it returns an error, this is leaking.

> +   /*
> +* write_author_script() used to fail to terminate the GIT_AUTHOR_DATE
> +* line with a "'" and also escaped "'" incorrectly as "'''" 
> rather
> +* than "'\\''". We check for the terminating "'" on the last line to
> +* see how "'" has been escaped in case git was upgraded while rebase
> +* was stopped.
> +*/
> +   sq_bug = script.len && script.buf[script.len - 2] != '\'';

I think you need to be checking 'script.len > 1', not just
'script.len', otherwise you might access memory outside the allocated
buffer.

This is a very "delicate" check, assuming that a hand-edited file
won't end with, say, an extra newline. I wonder if this level of
backward-compatibility is overkill for such an unlikely case.

> for (p = script.buf; *p; p++)
> -   if (skip_prefix(p, "'''", (const char **)))
> +   if (sq_bug && skip_prefix(p, "'''", ))
> +   strbuf_splice(, p - script.buf, p2 - p, "'", 
> 1);
> +   else if (skip_prefix(p, "'\\''", ))
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> @@ -75,6 +75,22 @@ test_expect_success 'rebase --keep-empty' '
> +test_expect_success 'rebase -i writes correct author-script' '
> +   test_when_finished "test_might_fail git rebase --abort" &&
> +   git checkout -b author-with-sq master &&
> +   GIT_AUTHOR_NAME="Auth O$SQ R" git commit --allow-empty -m with-sq &&
> +   set_fake_editor &&
> +   FAKE_LINES="edit 1" git rebase -ki HEAD^ &&

Hmph, -k doesn't seem to be documented in git-rebase.txt. Is it needed here?