Re: pdksh wrong strips quotes in shell substs

2013-03-03 Thread Philip Guenther
The fix for this is (finally) committed for post-5.3.  Sorry about the delay.


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-15 Thread Alexander Polakov
Now with Philip's fix applied.

* Philip Guenther  [120815 15:40]:
> On Sun, Aug 12, 2012 at 12:43 PM, Alexander Polakov  wrote:
> > Ok, second attempt
> 
> Nice.  One fix;
> 
> 
> > @@ -342,7 +346,10 @@
> > PUSH_STATE(STBRACE);
> > } else {
> > ungetsc(c);
> > -   PUSH_STATE(SBRACE);
> > +   if (state == SDQUOTE)
> > +   PUSH_STATE(SBRACEQ);
> > +   else
> > +   PUSH_STATE(SBRACE);
> 
> That should be
>if (state == SDQUOTE ||
>state == SBRACEQ)
> ...
> 
> so that doubly nested forms like
> foo=1; echo "${foo:+${foo:+'blah  $foo'}}"
> give
> 'blah  1'
> 
> ...and yes, I now have a still bigger diff to unclass2.t that includes
> that test. :-)
> 
> 
> I think the lex.h diff is nicer if the new SBRACEQ state is stuck at
> the end of the existing list instead of renumbering additional states,
> but I can be argued into renumbering them...

I don't really have a strong opinion on this, but it looks better to me
when SBRACE & SBRACEQ are together, because they're similar.

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   15 Aug 2012 22:40:26 -
@@ -113,7 +113,7 @@
 
   Again:
states[0].ls_state = -1;
-   states[0].ls_info.base = (Lex_state *) 0;
+   states[0].ls_info.base = (Lex_state *) NULL;
statep = &states[1];
state_info.base = states;
state_info.end = &states[STATE_BSIZE];
@@ -270,6 +270,10 @@
*wp++ = QCHAR, *wp++ = c;
break;
case '\'':
+   if ((cf & HEREDOC) || state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = OQUOTE;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
@@ -342,7 +346,11 @@
PUSH_STATE(STBRACE);
} else {
ungetsc(c);
-   PUSH_STATE(SBRACE);
+   if (state == SDQUOTE ||
+   state == SBRACEQ)
+   PUSH_STATE(SBRACEQ);
+   else
+   PUSH_STATE(SBRACE);
}
} else if (ctype(c, C_ALPHA)) {
*wp++ = OSUBST;
@@ -419,6 +427,10 @@
case SSQUOTE:
if (c == '\'') {
POP_STATE();
+   if (state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = CQUOTE;
ignore_backslash_newline--;
} else
@@ -525,6 +537,7 @@
*wp++ = c;
break;
 
+   case SBRACEQ:
case SBRACE:
/*{*/
if (c == '}') {
Index: lex.h
===
RCS file: /cvs/src/bin/ksh/lex.h,v
retrieving revision 1.11
diff -u -r1.11 lex.h
--- lex.h   29 May 2006 18:22:24 -  1.11
+++ lex.h   15 Aug 2012 22:40:26 -
@@ -51,19 +51,20 @@
 /*
  * states while lexing word
  */
-#defineSBASE   0   /* outside any lexical constructs */
-#defineSWORD   1   /* implicit quoting for substitute() */
-#defineSLETPAREN 2 /* inside (( )), implicit quoting */
-#defineSSQUOTE 3   /* inside '' */
-#defineSDQUOTE 4   /* inside "" */
-#defineSBRACE  5   /* inside ${} */
-#defineSCSPAREN 6  /* inside $() */
-#defineSBQUOTE 7   /* inside `` */
-#defineSASPAREN 8  /* inside $(( )) */
-#define SHEREDELIM 9   /* parsing <<,<<- delimiter */
-#define SHEREDQUOTE 10 /* parsing " in <<,<<- delimiter */
-#define SPATTERN 11/* parsing *(...|..

Re: pdksh wrong strips quotes in shell substs

2012-08-14 Thread Philip Guenther
On Sun, Aug 12, 2012 at 12:43 PM, Alexander Polakov  wrote:
> Ok, second attempt

Nice.  One fix;


> @@ -342,7 +346,10 @@
> PUSH_STATE(STBRACE);
> } else {
> ungetsc(c);
> -   PUSH_STATE(SBRACE);
> +   if (state == SDQUOTE)
> +   PUSH_STATE(SBRACEQ);
> +   else
> +   PUSH_STATE(SBRACE);

That should be
   if (state == SDQUOTE ||
   state == SBRACEQ)
...

so that doubly nested forms like
foo=1; echo "${foo:+${foo:+'blah  $foo'}}"
give
'blah  1'

...and yes, I now have a still bigger diff to unclass2.t that includes
that test. :-)


I think the lex.h diff is nicer if the new SBRACEQ state is stuck at
the end of the existing list instead of renumbering additional states,
but I can be argued into renumbering them...


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-12 Thread Alexander Polakov
* Philip Guenther  [120812 18:50]:
> On Sun, 12 Aug 2012, Alexander Polakov wrote:
> Below is an expansion of your diff for tests/unclass2.t to cover the cases 
> discussed in this thread plus a couple others.
> 
> 
> As for how to get the fix to cover all those cases: the heredoc case can 
> be handled by changing the added test to
>   if ((cf & HEREDOC) || state == SBRACE)
> 
> ...but it needs to be something like
>   if ((cf & HEREDOC) || (state == SBRACE && is_in_double_quotes))
> 
> with "is_in_double_quotes" some magic to differentiate between the 
> "${foo+'bar'}" and ${foo+'bar'} cases.  That presumably involves walking 
> up the statep list, but that has to stop when it hits a wrapping 
> command-expansion.  For example, the third new test in the diff below:
>   echo "$( echo ${foo:+'blah  $foo'})"
> that should output
>   blah  $foo

Ok, second attempt

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   12 Aug 2012 19:13:12 -
@@ -113,7 +113,7 @@
 
   Again:
states[0].ls_state = -1;
-   states[0].ls_info.base = (Lex_state *) 0;
+   states[0].ls_info.base = (Lex_state *) NULL;
statep = &states[1];
state_info.base = states;
state_info.end = &states[STATE_BSIZE];
@@ -270,6 +270,10 @@
*wp++ = QCHAR, *wp++ = c;
break;
case '\'':
+   if ((cf & HEREDOC) || state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = OQUOTE;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
@@ -342,7 +346,10 @@
PUSH_STATE(STBRACE);
} else {
ungetsc(c);
-   PUSH_STATE(SBRACE);
+   if (state == SDQUOTE)
+   PUSH_STATE(SBRACEQ);
+   else
+   PUSH_STATE(SBRACE);
}
} else if (ctype(c, C_ALPHA)) {
*wp++ = OSUBST;
@@ -419,6 +426,10 @@
case SSQUOTE:
if (c == '\'') {
POP_STATE();
+   if (state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = CQUOTE;
ignore_backslash_newline--;
} else
@@ -525,6 +536,7 @@
*wp++ = c;
break;
 
+   case SBRACEQ:
case SBRACE:
/*{*/
if (c == '}') {
Index: lex.h
===
RCS file: /cvs/src/bin/ksh/lex.h,v
retrieving revision 1.11
diff -u -r1.11 lex.h
--- lex.h   29 May 2006 18:22:24 -  1.11
+++ lex.h   12 Aug 2012 19:13:12 -
@@ -51,19 +51,20 @@
 /*
  * states while lexing word
  */
-#defineSBASE   0   /* outside any lexical constructs */
-#defineSWORD   1   /* implicit quoting for substitute() */
-#defineSLETPAREN 2 /* inside (( )), implicit quoting */
-#defineSSQUOTE 3   /* inside '' */
-#defineSDQUOTE 4   /* inside "" */
-#defineSBRACE  5   /* inside ${} */
-#defineSCSPAREN 6  /* inside $() */
-#defineSBQUOTE 7   /* inside `` */
-#defineSASPAREN 8  /* inside $(( )) */
-#define SHEREDELIM 9   /* parsing <<,<<- delimiter */
-#define SHEREDQUOTE 10 /* parsing " in <<,<<- delimiter */
-#define SPATTERN 11/* parsing *(...|...) pattern (*+?@!) */
-#define STBRACE 12 /* parsing ${..[#%]..} */
+#define SBASE  0   /* outside any lexical constructs */
+#define SWORD  1   /* implicit quoting for substitute() */
+#define SLETPAREN  2   /* inside (( )), implicit quoting */
+#define SSQUOTE3   /* inside '' */
+#define SDQUOTE4   /* inside "" */
+#define SBRACE 5   /* inside ${} */
+#define SBRACEQ6   /* inside "${}" */
+#define SCSPAREN   7   /* inside $() */
+#de

Re: pdksh wrong strips quotes in shell substs

2012-08-11 Thread Philip Guenther
On Sun, 12 Aug 2012, Alexander Polakov wrote:
> Nobody cares, so here's my attempt. I haven't run it though a 
> world/ports build because of slow hardware on my hands right now.
> 
> Index: lex.c
> ===
> RCS file: /cvs/src/bin/ksh/lex.c,v
> retrieving revision 1.45
> diff -u -r1.45 lex.c
> --- lex.c 9 Mar 2011 09:30:39 -   1.45
> +++ lex.c 11 Aug 2012 20:18:15 -
> @@ -271,6 +271,8 @@
>   break;
>   case '\'':
>   *wp++ = OQUOTE;
> + if (state == SBRACE)
> + *wp++ = CHAR, *wp++ = c;
>   ignore_backslash_newline++;
>   PUSH_STATE(SSQUOTE);
>   break;
> @@ -420,6 +422,8 @@
>   if (c == '\'') {
>   POP_STATE();
>   *wp++ = CQUOTE;
> + if (state == SBRACE)
> + *wp++ = CHAR, *wp++ = c;
>   ignore_backslash_newline--;
>   } else
>   *wp++ = QCHAR, *wp++ = c;


Unfortunately, that diff breaks the current, correct handling of 
single-quotes inside ${} that is *not* inside double-quotes.  This:
foo=1; echo ${foo:+'blah  $foo'}
should output
blah  $foo

(Note the two spaces between 'blah' and '$foo' in both input and output)


Below is an expansion of your diff for tests/unclass2.t to cover the cases 
discussed in this thread plus a couple others.


As for how to get the fix to cover all those cases: the heredoc case can 
be handled by changing the added test to
if ((cf & HEREDOC) || state == SBRACE)

...but it needs to be something like
if ((cf & HEREDOC) || (state == SBRACE && is_in_double_quotes))

with "is_in_double_quotes" some magic to differentiate between the 
"${foo+'bar'}" and ${foo+'bar'} cases.  That presumably involves walking 
up the statep list, but that has to stop when it hits a wrapping 
command-expansion.  For example, the third new test in the diff below:
echo "$( echo ${foo:+'blah  $foo'})"
that should output
blah  $foo


Philip


Index: tests/unclass2.t
===
RCS file: /cvs/src/bin/ksh/tests/unclass2.t,v
retrieving revision 1.2
diff -u -p -r1.2 unclass2.t
--- tests/unclass2.t25 Jun 1998 19:02:43 -  1.2
+++ tests/unclass2.t12 Aug 2012 01:58:01 -
@@ -161,3 +161,55 @@ expected-stderr: !
XX
 ---
 
+name: single-quotes-in-braces
+description:
+   Check that single quotes inside unquoted {} are treated as quotes
+stdin:
+   foo=1
+   echo ${foo:+'blah  $foo'}
+expected-stdout:
+   blah  $foo
+---
+
+name: single-quotes-in-quoted-braces
+description:
+   Check that single quotes inside quoted {} are treated as normal char
+stdin:
+   foo=1
+   echo "${foo:+'blah  $foo'}"
+expected-stdout:
+   'blah  1'
+---
+
+name: single-quotes-in-braces-nested
+description:
+   Check that single quotes inside unquoted {} are treated as quotes,
+   even if that's inside a double-quoted command expansion
+stdin:
+   foo=1
+   echo "$( echo ${foo:+'blah  $foo'})"
+expected-stdout:
+   blah  $foo
+---
+
+name: single-quotes-in-brace-pattern
+description:
+   Check that single quotes inside {} pattern are treated as quotes
+stdin:
+   foo=1234
+   echo ${foo%'2'*} "${foo%'2'*}" ${foo%2'*'} "${foo%2'*'}"
+expected-stdout:
+   1 1 1234 1234
+---
+
+name: single-quotes-in-heredoc-braces
+description:
+   Check that single quotes inside {} in heredoc are treated as normal char
+stdin:
+   foo=1
+   cat <

Re: pdksh wrong strips quotes in shell substs

2012-08-11 Thread Alexander Polakov
* Philip Guenther  [120810 00:45]:
> On Thu, Aug 9, 2012 at 5:31 AM, Mark Kettenis  wrote:
> >> Date: Thu, 9 Aug 2012 11:30:13 +0200
> >> From: Landry Breuil 
> ...
> >> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
> >> i've stumbled upon the following issue:
> >> (our pdksh)
> >>
> > $FOO=1
> >> $cat < >> > echo ${FOO:+'blah'aa}
> >> > EOF
> >> echo blahaa
> >>
> >> bash-4.2# FOO=1
> >> bash-4.2# cat < >> >  echo ${FOO:+'blah'aa}
> >> > EOF
> >>  echo 'blah'aa
> >>
> >> Apparently the ksh from solaris, hpux and debian don't strip the quotes
> >> in that usecase, and none of the other shells do (bash, dash, zsh...)
> >>
> >> So maybe it can be considered as a bug in our pdksh..
> >
> > I think it is.  POSIX says in 2.6.2 Parameter Expansion that:
> >
> >   "...word shall be subjected to tilde expansion, parameter expansion,
> >   command substitution, and arithmetic expansion."
> >
> > which suggests that quote removal isn't supposed to happen.
> 
> I agree.  The bug applies to the simple double-quoted case too:
> 
> $ foo=1
> $ echo "${foo:+'blah'}"
> blah
> $
> 
> The correct output there should include the single-quotes, ala:
> 'blah'

Hi.

Nobody cares, so here's my attempt. I haven't run it though a
world/ports build because of slow hardware on my hands right now.

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   11 Aug 2012 20:18:15 -
@@ -271,6 +271,8 @@
break;
case '\'':
*wp++ = OQUOTE;
+   if (state == SBRACE)
+   *wp++ = CHAR, *wp++ = c;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
break;
@@ -420,6 +422,8 @@
if (c == '\'') {
POP_STATE();
*wp++ = CQUOTE;
+   if (state == SBRACE)
+   *wp++ = CHAR, *wp++ = c;
ignore_backslash_newline--;
} else
*wp++ = QCHAR, *wp++ = c;
Index: tests/unclass2.t
===
RCS file: /cvs/src/bin/ksh/tests/unclass2.t,v
retrieving revision 1.2
diff -u -r1.2 unclass2.t
--- tests/unclass2.t25 Jun 1998 19:02:43 -  1.2
+++ tests/unclass2.t11 Aug 2012 20:18:19 -
@@ -161,3 +161,12 @@
XX
 ---
 
+name: single quotes in braces
+description:
+   Check that single quotes inside {} are not lost
+stdin:
+   foo=1
+   echo "${foo:+'blah'}"
+expected-stdout:
+   'blah'
+---



Re: pdksh wrong strips quotes in shell substs

2012-08-09 Thread Philip Guenther
On Thu, Aug 9, 2012 at 5:31 AM, Mark Kettenis  wrote:
>> Date: Thu, 9 Aug 2012 11:30:13 +0200
>> From: Landry Breuil 
...
>> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
>> i've stumbled upon the following issue:
>> (our pdksh)
>>
> $FOO=1
>> $cat <> > echo ${FOO:+'blah'aa}
>> > EOF
>> echo blahaa
>>
>> bash-4.2# FOO=1
>> bash-4.2# cat <> >  echo ${FOO:+'blah'aa}
>> > EOF
>>  echo 'blah'aa
>>
>> Apparently the ksh from solaris, hpux and debian don't strip the quotes
>> in that usecase, and none of the other shells do (bash, dash, zsh...)
>>
>> So maybe it can be considered as a bug in our pdksh..
>
> I think it is.  POSIX says in 2.6.2 Parameter Expansion that:
>
>   "...word shall be subjected to tilde expansion, parameter expansion,
>   command substitution, and arithmetic expansion."
>
> which suggests that quote removal isn't supposed to happen.

I agree.  The bug applies to the simple double-quoted case too:

$ foo=1
$ echo "${foo:+'blah'}"
blah
$

The correct output there should include the single-quotes, ala:
'blah'


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-09 Thread Mark Kettenis
> Date: Thu, 9 Aug 2012 11:30:13 +0200
> From: Landry Breuil 
> List-Owner: 
> X-Loop: tech@openbsd.org
> Sender: owner-t...@openbsd.org
> X-XS4ALL-DNSBL-Checked: mxdrop222.xs4all.nl checked 192.43.244.163 against 
> DNS blacklists
> X-CNFS-Analysis: v=2.0 cv=JstGWrEC c=1 sm=0 a=A3duGc4wJ8K8BtNzzvyz4A==:17
>   a=wom5GMh1gUkA:10 a=Tyz8kzrSaUAA:10 a=A-czM1wNRfMA:10
>   a=kj9zAlcOel0A:10 a=HuvqD92K:8 a=pQs5aej7:8
>   a=d6fq7G5S5sWEmd_PXfoA:9 a=CjuIK1q_8ugA:10
>   a=A3duGc4wJ8K8BtNzzvyz4A==:117
> X-Virus-Scanned: by XS4ALL Virus Scanner
> X-XS4ALL-Spam-Score: 0.0 () none
> X-XS4ALL-Spam: NO
> Envelope-To: mark.kette...@xs4all.nl
> 
> Hi,
> 
> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
> i've stumbled upon the following issue:
> (our pdksh)
> 
$FOO=1
> $cat < > echo ${FOO:+'blah'aa}
> > EOF
> echo blahaa
> 
> bash-4.2# FOO=1
> bash-4.2# cat < >  echo ${FOO:+'blah'aa}
> > EOF
>  echo 'blah'aa
> 
> Apparently the ksh from solaris, hpux and debian don't strip the quotes
> in that usecase, and none of the other shells do (bash, dash, zsh...)
> 
> So maybe it can be considered as a bug in our pdksh..

I think it is.  POSIX says in 2.6.2 Parameter Expansion that:

  "...word shall be subjected to tilde expansion, parameter expansion,
  command substitution, and arithmetic expansion."

which suggests that quote removal isn't supposed to happen.