Re: pdksh wrong strips quotes in shell substs

2013-03-03 Thread Philip Guenther
The fix for this is (finally) committed for post-5.3.  Sorry about the delay.


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-15 Thread Alexander Polakov
Now with Philip's fix applied.

* Philip Guenther  [120815 15:40]:
> On Sun, Aug 12, 2012 at 12:43 PM, Alexander Polakov  wrote:
> > Ok, second attempt
> 
> Nice.  One fix;
> 
> 
> > @@ -342,7 +346,10 @@
> > PUSH_STATE(STBRACE);
> > } else {
> > ungetsc(c);
> > -   PUSH_STATE(SBRACE);
> > +   if (state == SDQUOTE)
> > +   PUSH_STATE(SBRACEQ);
> > +   else
> > +   PUSH_STATE(SBRACE);
> 
> That should be
>if (state == SDQUOTE ||
>state == SBRACEQ)
> ...
> 
> so that doubly nested forms like
> foo=1; echo "${foo:+${foo:+'blah  $foo'}}"
> give
> 'blah  1'
> 
> ...and yes, I now have a still bigger diff to unclass2.t that includes
> that test. :-)
> 
> 
> I think the lex.h diff is nicer if the new SBRACEQ state is stuck at
> the end of the existing list instead of renumbering additional states,
> but I can be argued into renumbering them...

I don't really have a strong opinion on this, but it looks better to me
when SBRACE & SBRACEQ are together, because they're similar.

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   15 Aug 2012 22:40:26 -
@@ -113,7 +113,7 @@
 
   Again:
states[0].ls_state = -1;
-   states[0].ls_info.base = (Lex_state *) 0;
+   states[0].ls_info.base = (Lex_state *) NULL;
statep = &states[1];
state_info.base = states;
state_info.end = &states[STATE_BSIZE];
@@ -270,6 +270,10 @@
*wp++ = QCHAR, *wp++ = c;
break;
case '\'':
+   if ((cf & HEREDOC) || state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = OQUOTE;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
@@ -342,7 +346,11 @@
PUSH_STATE(STBRACE);
} else {
ungetsc(c);
-   PUSH_STATE(SBRACE);
+   if (state == SDQUOTE ||
+   state == SBRACEQ)
+   PUSH_STATE(SBRACEQ);
+   else
+   PUSH_STATE(SBRACE);
}
} else if (ctype(c, C_ALPHA)) {
*wp++ = OSUBST;
@@ -419,6 +427,10 @@
case SSQUOTE:
if (c == '\'') {
POP_STATE();
+   if (state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = CQUOTE;
ignore_backslash_newline--;
} else
@@ -525,6 +537,7 @@
*wp++ = c;
break;
 
+   case SBRACEQ:
case SBRACE:
/*{*/
if (c == '}') {
Index: lex.h
===
RCS file: /cvs/src/bin/ksh/lex.h,v
retrieving revision 1.11
diff -u -r1.11 lex.h
--- lex.h   29 May 2006 18:22:24 -  1.11
+++ lex.h   15 Aug 2012 22:40:26 -
@@ -51,19 +51,20 @@
 /*
  * states while lexing word
  */
-#defineSBASE   0   /* outside any lexical constructs */
-#defineSWORD   1   /* implicit quoting for substitute() */
-#defineSLETPAREN 2 /* inside (( )), implicit quoting */
-#defineSSQUOTE 3   /* inside '' */
-#defineSDQUOTE 4   /* inside "" */
-#defineSBRACE  5   /* inside ${} */
-#defineSCSPAREN 6  /* inside $() */
-#defineSBQUOTE 7   /* inside `` */
-#defineSASPAREN 8  /* inside $(( )) */
-#define SHEREDELIM 9   /* parsing <<,<<- delimiter */
-#define SHEREDQUOTE 10 /* parsing " in <<,<<- delimiter */
-#define SPATTERN 11/* parsing *(...|..

Re: pdksh wrong strips quotes in shell substs

2012-08-14 Thread Philip Guenther
On Sun, Aug 12, 2012 at 12:43 PM, Alexander Polakov  wrote:
> Ok, second attempt

Nice.  One fix;


> @@ -342,7 +346,10 @@
> PUSH_STATE(STBRACE);
> } else {
> ungetsc(c);
> -   PUSH_STATE(SBRACE);
> +   if (state == SDQUOTE)
> +   PUSH_STATE(SBRACEQ);
> +   else
> +   PUSH_STATE(SBRACE);

That should be
   if (state == SDQUOTE ||
   state == SBRACEQ)
...

so that doubly nested forms like
foo=1; echo "${foo:+${foo:+'blah  $foo'}}"
give
'blah  1'

...and yes, I now have a still bigger diff to unclass2.t that includes
that test. :-)


I think the lex.h diff is nicer if the new SBRACEQ state is stuck at
the end of the existing list instead of renumbering additional states,
but I can be argued into renumbering them...


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-12 Thread Alexander Polakov
* Philip Guenther  [120812 18:50]:
> On Sun, 12 Aug 2012, Alexander Polakov wrote:
> Below is an expansion of your diff for tests/unclass2.t to cover the cases 
> discussed in this thread plus a couple others.
> 
> 
> As for how to get the fix to cover all those cases: the heredoc case can 
> be handled by changing the added test to
>   if ((cf & HEREDOC) || state == SBRACE)
> 
> ...but it needs to be something like
>   if ((cf & HEREDOC) || (state == SBRACE && is_in_double_quotes))
> 
> with "is_in_double_quotes" some magic to differentiate between the 
> "${foo+'bar'}" and ${foo+'bar'} cases.  That presumably involves walking 
> up the statep list, but that has to stop when it hits a wrapping 
> command-expansion.  For example, the third new test in the diff below:
>   echo "$( echo ${foo:+'blah  $foo'})"
> that should output
>   blah  $foo

Ok, second attempt

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   12 Aug 2012 19:13:12 -
@@ -113,7 +113,7 @@
 
   Again:
states[0].ls_state = -1;
-   states[0].ls_info.base = (Lex_state *) 0;
+   states[0].ls_info.base = (Lex_state *) NULL;
statep = &states[1];
state_info.base = states;
state_info.end = &states[STATE_BSIZE];
@@ -270,6 +270,10 @@
*wp++ = QCHAR, *wp++ = c;
break;
case '\'':
+   if ((cf & HEREDOC) || state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = OQUOTE;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
@@ -342,7 +346,10 @@
PUSH_STATE(STBRACE);
} else {
ungetsc(c);
-   PUSH_STATE(SBRACE);
+   if (state == SDQUOTE)
+   PUSH_STATE(SBRACEQ);
+   else
+   PUSH_STATE(SBRACE);
}
} else if (ctype(c, C_ALPHA)) {
*wp++ = OSUBST;
@@ -419,6 +426,10 @@
case SSQUOTE:
if (c == '\'') {
POP_STATE();
+   if (state == SBRACEQ) {
+   *wp++ = CHAR, *wp++ = c;
+   break;
+   }
*wp++ = CQUOTE;
ignore_backslash_newline--;
} else
@@ -525,6 +536,7 @@
*wp++ = c;
break;
 
+   case SBRACEQ:
case SBRACE:
/*{*/
if (c == '}') {
Index: lex.h
===
RCS file: /cvs/src/bin/ksh/lex.h,v
retrieving revision 1.11
diff -u -r1.11 lex.h
--- lex.h   29 May 2006 18:22:24 -  1.11
+++ lex.h   12 Aug 2012 19:13:12 -
@@ -51,19 +51,20 @@
 /*
  * states while lexing word
  */
-#defineSBASE   0   /* outside any lexical constructs */
-#defineSWORD   1   /* implicit quoting for substitute() */
-#defineSLETPAREN 2 /* inside (( )), implicit quoting */
-#defineSSQUOTE 3   /* inside '' */
-#defineSDQUOTE 4   /* inside "" */
-#defineSBRACE  5   /* inside ${} */
-#defineSCSPAREN 6  /* inside $() */
-#defineSBQUOTE 7   /* inside `` */
-#defineSASPAREN 8  /* inside $(( )) */
-#define SHEREDELIM 9   /* parsing <<,<<- delimiter */
-#define SHEREDQUOTE 10 /* parsing " in <<,<<- delimiter */
-#define SPATTERN 11/* parsing *(...|...) pattern (*+?@!) */
-#define STBRACE 12 /* parsing ${..[#%]..} */
+#define SBASE  0   /* outside any lexical constructs */
+#define SWORD  1   /* implicit quoting for substitute() */
+#define SLETPAREN  2   /* inside (( )), implicit quoting */
+#define SSQUOTE3   /* inside '' */
+#define SDQUOTE4   /* inside "" */
+#define SBRACE 5   /* inside ${} */
+#define SBRACEQ6   /* inside "${}" */
+#define SCSPAREN   7   /* inside $() */
+#de

Re: pdksh wrong strips quotes in shell substs

2012-08-11 Thread Philip Guenther
On Sun, 12 Aug 2012, Alexander Polakov wrote:
> Nobody cares, so here's my attempt. I haven't run it though a 
> world/ports build because of slow hardware on my hands right now.
> 
> Index: lex.c
> ===
> RCS file: /cvs/src/bin/ksh/lex.c,v
> retrieving revision 1.45
> diff -u -r1.45 lex.c
> --- lex.c 9 Mar 2011 09:30:39 -   1.45
> +++ lex.c 11 Aug 2012 20:18:15 -
> @@ -271,6 +271,8 @@
>   break;
>   case '\'':
>   *wp++ = OQUOTE;
> + if (state == SBRACE)
> + *wp++ = CHAR, *wp++ = c;
>   ignore_backslash_newline++;
>   PUSH_STATE(SSQUOTE);
>   break;
> @@ -420,6 +422,8 @@
>   if (c == '\'') {
>   POP_STATE();
>   *wp++ = CQUOTE;
> + if (state == SBRACE)
> + *wp++ = CHAR, *wp++ = c;
>   ignore_backslash_newline--;
>   } else
>   *wp++ = QCHAR, *wp++ = c;


Unfortunately, that diff breaks the current, correct handling of 
single-quotes inside ${} that is *not* inside double-quotes.  This:
foo=1; echo ${foo:+'blah  $foo'}
should output
blah  $foo

(Note the two spaces between 'blah' and '$foo' in both input and output)


Below is an expansion of your diff for tests/unclass2.t to cover the cases 
discussed in this thread plus a couple others.


As for how to get the fix to cover all those cases: the heredoc case can 
be handled by changing the added test to
if ((cf & HEREDOC) || state == SBRACE)

...but it needs to be something like
if ((cf & HEREDOC) || (state == SBRACE && is_in_double_quotes))

with "is_in_double_quotes" some magic to differentiate between the 
"${foo+'bar'}" and ${foo+'bar'} cases.  That presumably involves walking 
up the statep list, but that has to stop when it hits a wrapping 
command-expansion.  For example, the third new test in the diff below:
echo "$( echo ${foo:+'blah  $foo'})"
that should output
blah  $foo


Philip


Index: tests/unclass2.t
===
RCS file: /cvs/src/bin/ksh/tests/unclass2.t,v
retrieving revision 1.2
diff -u -p -r1.2 unclass2.t
--- tests/unclass2.t25 Jun 1998 19:02:43 -  1.2
+++ tests/unclass2.t12 Aug 2012 01:58:01 -
@@ -161,3 +161,55 @@ expected-stderr: !
XX
 ---
 
+name: single-quotes-in-braces
+description:
+   Check that single quotes inside unquoted {} are treated as quotes
+stdin:
+   foo=1
+   echo ${foo:+'blah  $foo'}
+expected-stdout:
+   blah  $foo
+---
+
+name: single-quotes-in-quoted-braces
+description:
+   Check that single quotes inside quoted {} are treated as normal char
+stdin:
+   foo=1
+   echo "${foo:+'blah  $foo'}"
+expected-stdout:
+   'blah  1'
+---
+
+name: single-quotes-in-braces-nested
+description:
+   Check that single quotes inside unquoted {} are treated as quotes,
+   even if that's inside a double-quoted command expansion
+stdin:
+   foo=1
+   echo "$( echo ${foo:+'blah  $foo'})"
+expected-stdout:
+   blah  $foo
+---
+
+name: single-quotes-in-brace-pattern
+description:
+   Check that single quotes inside {} pattern are treated as quotes
+stdin:
+   foo=1234
+   echo ${foo%'2'*} "${foo%'2'*}" ${foo%2'*'} "${foo%2'*'}"
+expected-stdout:
+   1 1 1234 1234
+---
+
+name: single-quotes-in-heredoc-braces
+description:
+   Check that single quotes inside {} in heredoc are treated as normal char
+stdin:
+   foo=1
+   cat <

Re: pdksh wrong strips quotes in shell substs

2012-08-11 Thread Alexander Polakov
* Philip Guenther  [120810 00:45]:
> On Thu, Aug 9, 2012 at 5:31 AM, Mark Kettenis  wrote:
> >> Date: Thu, 9 Aug 2012 11:30:13 +0200
> >> From: Landry Breuil 
> ...
> >> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
> >> i've stumbled upon the following issue:
> >> (our pdksh)
> >>
> > $FOO=1
> >> $cat < >> > echo ${FOO:+'blah'aa}
> >> > EOF
> >> echo blahaa
> >>
> >> bash-4.2# FOO=1
> >> bash-4.2# cat < >> >  echo ${FOO:+'blah'aa}
> >> > EOF
> >>  echo 'blah'aa
> >>
> >> Apparently the ksh from solaris, hpux and debian don't strip the quotes
> >> in that usecase, and none of the other shells do (bash, dash, zsh...)
> >>
> >> So maybe it can be considered as a bug in our pdksh..
> >
> > I think it is.  POSIX says in 2.6.2 Parameter Expansion that:
> >
> >   "...word shall be subjected to tilde expansion, parameter expansion,
> >   command substitution, and arithmetic expansion."
> >
> > which suggests that quote removal isn't supposed to happen.
> 
> I agree.  The bug applies to the simple double-quoted case too:
> 
> $ foo=1
> $ echo "${foo:+'blah'}"
> blah
> $
> 
> The correct output there should include the single-quotes, ala:
> 'blah'

Hi.

Nobody cares, so here's my attempt. I haven't run it though a
world/ports build because of slow hardware on my hands right now.

Index: lex.c
===
RCS file: /cvs/src/bin/ksh/lex.c,v
retrieving revision 1.45
diff -u -r1.45 lex.c
--- lex.c   9 Mar 2011 09:30:39 -   1.45
+++ lex.c   11 Aug 2012 20:18:15 -
@@ -271,6 +271,8 @@
break;
case '\'':
*wp++ = OQUOTE;
+   if (state == SBRACE)
+   *wp++ = CHAR, *wp++ = c;
ignore_backslash_newline++;
PUSH_STATE(SSQUOTE);
break;
@@ -420,6 +422,8 @@
if (c == '\'') {
POP_STATE();
*wp++ = CQUOTE;
+   if (state == SBRACE)
+   *wp++ = CHAR, *wp++ = c;
ignore_backslash_newline--;
} else
*wp++ = QCHAR, *wp++ = c;
Index: tests/unclass2.t
===
RCS file: /cvs/src/bin/ksh/tests/unclass2.t,v
retrieving revision 1.2
diff -u -r1.2 unclass2.t
--- tests/unclass2.t25 Jun 1998 19:02:43 -  1.2
+++ tests/unclass2.t11 Aug 2012 20:18:19 -
@@ -161,3 +161,12 @@
XX
 ---
 
+name: single quotes in braces
+description:
+   Check that single quotes inside {} are not lost
+stdin:
+   foo=1
+   echo "${foo:+'blah'}"
+expected-stdout:
+   'blah'
+---



Re: pdksh wrong strips quotes in shell substs

2012-08-09 Thread Philip Guenther
On Thu, Aug 9, 2012 at 5:31 AM, Mark Kettenis  wrote:
>> Date: Thu, 9 Aug 2012 11:30:13 +0200
>> From: Landry Breuil 
...
>> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
>> i've stumbled upon the following issue:
>> (our pdksh)
>>
> $FOO=1
>> $cat <> > echo ${FOO:+'blah'aa}
>> > EOF
>> echo blahaa
>>
>> bash-4.2# FOO=1
>> bash-4.2# cat <> >  echo ${FOO:+'blah'aa}
>> > EOF
>>  echo 'blah'aa
>>
>> Apparently the ksh from solaris, hpux and debian don't strip the quotes
>> in that usecase, and none of the other shells do (bash, dash, zsh...)
>>
>> So maybe it can be considered as a bug in our pdksh..
>
> I think it is.  POSIX says in 2.6.2 Parameter Expansion that:
>
>   "...word shall be subjected to tilde expansion, parameter expansion,
>   command substitution, and arithmetic expansion."
>
> which suggests that quote removal isn't supposed to happen.

I agree.  The bug applies to the simple double-quoted case too:

$ foo=1
$ echo "${foo:+'blah'}"
blah
$

The correct output there should include the single-quotes, ala:
'blah'


Philip Guenther



Re: pdksh wrong strips quotes in shell substs

2012-08-09 Thread Mark Kettenis
> Date: Thu, 9 Aug 2012 11:30:13 +0200
> From: Landry Breuil 
> List-Owner: 
> X-Loop: tech@openbsd.org
> Sender: owner-t...@openbsd.org
> X-XS4ALL-DNSBL-Checked: mxdrop222.xs4all.nl checked 192.43.244.163 against 
> DNS blacklists
> X-CNFS-Analysis: v=2.0 cv=JstGWrEC c=1 sm=0 a=A3duGc4wJ8K8BtNzzvyz4A==:17
>   a=wom5GMh1gUkA:10 a=Tyz8kzrSaUAA:10 a=A-czM1wNRfMA:10
>   a=kj9zAlcOel0A:10 a=HuvqD92K:8 a=pQs5aej7:8
>   a=d6fq7G5S5sWEmd_PXfoA:9 a=CjuIK1q_8ugA:10
>   a=A3duGc4wJ8K8BtNzzvyz4A==:117
> X-Virus-Scanned: by XS4ALL Virus Scanner
> X-XS4ALL-Spam-Score: 0.0 () none
> X-XS4ALL-Spam: NO
> Envelope-To: mark.kette...@xs4all.nl
> 
> Hi,
> 
> in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
> i've stumbled upon the following issue:
> (our pdksh)
> 
$FOO=1
> $cat < > echo ${FOO:+'blah'aa}
> > EOF
> echo blahaa
> 
> bash-4.2# FOO=1
> bash-4.2# cat < >  echo ${FOO:+'blah'aa}
> > EOF
>  echo 'blah'aa
> 
> Apparently the ksh from solaris, hpux and debian don't strip the quotes
> in that usecase, and none of the other shells do (bash, dash, zsh...)
> 
> So maybe it can be considered as a bug in our pdksh..

I think it is.  POSIX says in 2.6.2 Parameter Expansion that:

  "...word shall be subjected to tilde expansion, parameter expansion,
  command substitution, and arithmetic expansion."

which suggests that quote removal isn't supposed to happen.



pdksh wrong strips quotes in shell substs

2012-08-09 Thread Landry Breuil
Hi,

in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=781461
i've stumbled upon the following issue:
(our pdksh)

$cat < echo ${FOO:+'blah'aa}
> EOF
echo blahaa

bash-4.2# FOO=1
bash-4.2# cat <  echo ${FOO:+'blah'aa}
> EOF
 echo 'blah'aa

Apparently the ksh from solaris, hpux and debian don't strip the quotes
in that usecase, and none of the other shells do (bash, dash, zsh...)

So maybe it can be considered as a bug in our pdksh..

Landry