[Partial patch] IFS and read builtin

2010-08-22 Thread Harald van Dijk
Hi, as has been reported already dash currently has a bug where the read 
builtin ignores the read environment's IFS setting. As a result,

  echo a:b | { IFS=: read a b; echo $a; }

will write out a:b. I tried to see what changed between 0.5.5.1 and 0.5.6, and 
found that the old code used bltinlookup(IFS). So, I made the new code also 
use that. The above example works properly with the attached patch.

However, testing further, I found that there is bad interaction (even without 
my patch, where the IFS assignment below should just be ignored) between the 
code that handles variable assignments, and read. Try this:

   $ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; echo 
|$a|$b|$c|; }'
   |a||b|
   $ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; echo 
|$a|$b|$c|; }'
   |a|b||

This happens because expbackq is correctly called without EXP_QUOTED, which 
makes it call recordregion, which isn't cleared by the time read calls 
ifsbreakup. I'm not sure how that should be solved, and if there are more cases 
where it would fail. The simplest way to solve this for read is to call 
removerecordregions(0) before recordregion(0, len - 1, 0). I have tested that 
locally, and it works. I am not familiar enough with the code to judge whether 
the same situation can also happen in other cases that would also need fixing, 
which is why I have not included that part in the attached patch.

Cheers,
Harald

(Re-sent because of local mail problems.)
diff --git a/src/expand.c b/src/expand.c
index f2f964c..3ba1a38 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -205,7 +205,7 @@ expandarg(union node *arg, struct arglist *arglist, int 
flag)
 * TODO - EXP_REDIR
 */
if (flag  EXP_FULL) {
-   ifsbreakup(p, exparg);
+   ifsbreakup(p, exparg, 0);
*exparg.lastp = NULL;
exparg.lastp = exparg.list;
expandmeta(exparg.list, flag);
@@ -1022,9 +1022,11 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If bltin is set, use bltinlookup to search for IFS in the
+ * environment of the currently executing built-in command.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, struct arglist *arglist, int bltin)
 {
struct ifsregion *ifsp;
struct strlist *sp;
@@ -1040,7 +1042,13 @@ ifsbreakup(char *string, struct arglist *arglist)
if (ifslastp != NULL) {
ifsspc = 0;
nulonly = 0;
-   realifs = ifsset() ? ifsval() : defifs;
+   if (!bltin)
+   realifs = ifsset() ? ifsval() : defifs;
+   else {
+   realifs = bltinlookup(IFS);
+   if (realifs == NULL)
+   realifs = defifs;
+   }
ifsp = ifsfirst;
do {
p = string + ifsp-begoff;
diff --git a/src/expand.h b/src/expand.h
index 405af0b..8eb5f07 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, struct arglist *, int bltin);
 
 /* From arith.y */
 intmax_t arith(const char *);
diff --git a/src/miscbltin.c b/src/miscbltin.c
index 5ab1648..898ab09 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -87,7 +87,7 @@ readcmd_handle_line(char *line, char **ap, size_t len)
arglist.lastp = arglist.list;
recordregion(0, len - 1, 0);

-   ifsbreakup(s, arglist);
+   ifsbreakup(s, arglist, 1);
*arglist.lastp = NULL;
removerecordregions(0);
 



Re: [Partial patch] IFS and read builtin

2010-08-22 Thread Harald van Dijk
On 23/08/10 01:00, Jilles Tjoelker wrote:
 On Mon, Aug 23, 2010 at 12:20:12AM +0200, Harald van Dijk wrote:
[...]
   echo a:b | { IFS=: read a b; echo $a; }
[...]
 
 This has already been fixed in a totally different way in master. See
 git commits near 95a60b2936e8835963bfb08eadc0edf9dddf0498.

Interesting, thank you for the reference. That commit (from May) does a lot 
more than fix the read problem, and does not specifically address the read 
problem at all, which is probably why it has not made it into 0.5.6.1 (from 
June), and why I had not found it.

The IFS problem was breaking the build of glibc, and reportedly also causing an 
unbootable system for someone because of a bad build of libusb, so it was 
important to fix it somehow for Gentoo. Would you prefer I continue to use a 
version of the patch I posted, or use the more invasive commits from master?

$ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; echo 
 |$a|$b|$c|; }'
|a||b|
$ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; 
 echo |$a|$b|$c|; }'
|a|b||
[...]
 Ick. Your change will probably work, but it remains sneaky action at a
 distance. To reduce complexity, it would be good to implement read's
 splitting without using expand.c.

read used to have its own splitting code, but it was changed to use expand.c 
some time between 0.5.5 and 0.5.6.1. The old splitting code had bugs that 
expand.c did not have.

 I estimate the extra lines of code
 from importing FreeBSD's code at less than 50. It will also fix an edge
 case with the splitting. The last two lines of FreeBSD's
 builtins/read1.0 test are:
 
 echo  1 ,2 3, | { IFS=', ' read a b c; echo x${a}x${b}x${c}x; }
 echo  1 ,2 3,,| { IFS=', ' read a b c; echo x${a}x${b}x${c}x; }
 
 These should result in:
 
 x1x2x3x
 x1x2x3,,x
 
 bash and ksh93 agree on this. However, dash master prints:
 
 x1x2x3,x
 x1x2x3,,x

This is also what zsh prints. Could you explain why bash and ksh are correct? 
By my reading, when IFS=', ', then splitting  1 ,2 3, results in 1, 2, 
3, and . Leading and trailing IFS white space gets ignored, but the 
trailing , is not white space, so makes the total field count 4. Since there 
are four fields, a is assigned 1, b is assigned 2, c is assigned 3, which 
is 3, the separator ,, and the final field . But I'll be the first to 
admit I'm far from an expert, and I trust bash to get these things right, I'm 
just curious why bash is right.

That said, zsh is consistent here, and dash is not. zsh also makes
  x= 1 ,2 3,
  IFS=, 
  set $x
  echo $#
print out 4, while dash prints 3 in this case. So either way, something is 
wrong.
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-23 Thread Harald van Dijk
On 23/08/10 21:35, Jilles Tjoelker wrote:
 I think you should do what you think is best for the stability of your
 product. Because dash releases are not extensively tested, I'd recommend
 a trial build of at least a minimal base system with the new version you
 choose. A particular feature to be wary of is LINENO support, as it will
 cause most configure scripts to accept dash as a usable shell.

Thanks, I'm aware of that. I already locally exported CONFIG_SHELL, so
that even without LINENO support the configure scripts were already run
from dash.

That reminds me: the LINENO support is useful, but the tracking of line
numbers has some issues:

$ src/dash -c 'f() { echo $LINENO; }
f
f
'
2
3

But this is not new, and not limited to LINENO:

$ cat foo.sh
if :; then
foo
:
:
:
:
:
fi
$ src/dash foo.sh
foo.sh: 8: foo: not found
$ bash foo.sh
foo.sh: line 2: foo: command not found

I have a patch that improves this by storing the line numbers in the
command nodes, if you're interested, but it needs polishing before I
plan on sending it here or anywhere, and there are probably some corner
cases that it mishandles.

 Right, but it seems better to fix the bugs rather than entangle things
 with expand.c and still leave bugs.

Agreed, unless there's an easy way to ensure the command-line processing
has finished and cleaned up completely by the time read starts.

[IFS=, ]
 I think the important thing is that IFS characters are supposed to be
 field terminators (see POSIX XCU 2.6.5 Field Splitting).

 Therefore, in the example  1 ,2 3, there are three fields, each
 containing one digit, and each variable is assigned one of them.

The more I read it, the more I'm actually becoming convinced that zsh is
doing the right thing, and dash is almost doing the right thing.

2.6.5 uses the term delimiter, not terminator. They don't mean the
same thing. A delimiter can mark the start of a field as well as the
end. And if you compare susv2 with susv3, you may see susv2 is a lot
clearer than v3 on one point, because it ends the Field Splitting
section with a note.

The last rule can be summarised as a pseudo-ERE:

(s*ns*|s+)

 where s is an white-space character and n is a character in the
 that is not white space. Any string matching that ERE delimits a
 field, except that the s+ form does not delimit fields at the
 beginning or the end of a line. (followed by an example)

This says the s+ form does not delimit fields at the end of a line,
which strongly implies that the s*ns* form does. The wording is wrong no
matter how you look at it (splitting a   results in one field a, not
one field a  ), and the note has been removed in susv3. Still, it
manages to somewhat clarify the rest of text.

 In the example  1 ,2 3,, there are four fields, the last being empty.
 Then c is assigned the third field plus the delimiter character and the
 remaining fields and their delimiters except trailing whitespace that is
 in IFS. Hence, both commas end up in c.

The read command description states:

If there are fewer var operands specified than there are fields, the
 leftover fields and their intervening separators shall be assigned to
 the last var.

If  1 ,2 3,, forms four fields, where the fourth field is the empty
string between the two trailing commas, then the final comma is not an
intervening separator, so it should be excluded from c.

Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-24 Thread Harald van Dijk
On 25/08/10 00:51, Jilles Tjoelker wrote:
 Yes, I think that's the proper way to implement LINENO.
 
 FreeBSD sh avoids extending the nodes by detecting expansions of LINENO
 at parse time and storing the line number at that time. However, this is
 only possible because it does not print a line number when there is an
 error in a builtin.

I'm curious to try to see how FreeBSD sh handles eval 'echo $LINENO',
then. This will be a nice excuse to give it another try.

 POSIX.1-2008 (aka SUSv4) says, at one point, that the shell shall use
 the delimiters as field terminators.
[...]

Ah! So something _was_ wrong -- the standard I was reading. That
explains it all, thank you very much.

When I have a more polished and complete LINENO implementation, I'll
post it here.

Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Improved LINENO support

2010-11-08 Thread Harald van Dijk

Hi,

This patch improves LINENO support by storing line numbers in the parse 
tree, for commands as well as for function definitions. It makes LINENO 
behaves properly when calling functions, and has the added benefit of 
improved line numbers in error messages when the last-parsed command is 
not the last-executed one. It removes the earlier LINENO support, and 
instead sets LINENO from evaltree when a command is executed. I have 
used it for a while, and have seen correct line numbers in basic testing 
and in error messages in fairly large and complicated autoconf-generated 
shell scripts. Does this look good enough to add to dash? If there are 
any problems with it, please let me know, I will be happy to look at 
this further.


Cheers,
Harald
diff --git a/src/error.c b/src/error.c
index e304d3d..e60e430 100644
--- a/src/error.c
+++ b/src/error.c
@@ -62,6 +62,7 @@ struct jmploc *handler;
 int exception;
 int suppressint;
 volatile sig_atomic_t intpending;
+int errlinno;
 
 
 static void exverror(int, const char *, va_list)
@@ -116,13 +117,12 @@ exvwarning2(const char *msg, va_list ap)
const char *fmt;
 
errs = out2;
-   name = arg0 ?: sh;
-   fmt = %s: ;
-   if (commandname) {
-   name = commandname;
+   name = arg0 ? arg0 : sh;
+   if (!commandname)
fmt = %s: %d: ;
-   }
-   outfmt(errs, fmt, name, startlinno);
+   else
+   fmt = %s: %d: %s: ;
+   outfmt(errs, fmt, name, errlinno, commandname);
doformat(errs, msg, ap);
 #if FLUSHERR
outc('\n', errs);
diff --git a/src/error.h b/src/error.h
index 3162e15..08f96e9 100644
--- a/src/error.h
+++ b/src/error.h
@@ -122,6 +122,7 @@ void onint(void) __attribute__((__noreturn__));
 #else
 void onint(void);
 #endif
+extern int errlinno;
 void sh_error(const char *, ...) __attribute__((__noreturn__));
 void exerror(int, const char *, ...) __attribute__((__noreturn__));
 const char *errmsg(int, int);
diff --git a/src/eval.c b/src/eval.c
index b966749..8c5f82c 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -32,6 +32,7 @@
  * SUCH DAMAGE.
  */
 
+#include stdio.h
 #include stdlib.h
 #include signal.h
 #include unistd.h
@@ -73,7 +74,7 @@
 int evalskip;  /* set if we are skipping commands */
 STATIC int skipcount;  /* number of levels to skip */
 MKINIT int loopnest;   /* current loop nesting level */
-static int funcnest;   /* depth of function calls */
+STATIC int funcline;   /* starting line number of current function, or 
0 if not in a function */
 
 
 char *commandname;
@@ -191,6 +192,9 @@ evalstring(char *s, int flags)
 void
 evaltree(union node *n, int flags)
 {
+   static char linenoeq[7+(sizeof(int)*CHAR_BIT/3)+2] = LINENO=;
+   char *lineno = linenoeq+7;
+
int checkexit = 0;
void (*evalfn)(union node *, int);
unsigned isor;
@@ -204,6 +208,12 @@ evaltree(union node *n, int flags)
 #endif
TRACE((pid %d, evaltree(%p: %d, %d) called\n,
getpid(), n, n-type, flags));
+   /* In a function, LINENO must refer to the line number from the start
+* of the function. However, errlinno contains the source line number,
+* used in error messages. */
+   errlinno = n-ncmd.linno;
+   sprintf(lineno, %d, n-ncmd.linno - (funcline ? funcline-1 : 0));
+   setvareq(linenoeq, VTEXTFIXED);
switch (n-type) {
default:
 #ifdef DEBUG
@@ -296,7 +306,7 @@ calleval:
}
goto success;
case NDEFUN:
-   defun(n-narg.text, n-narg.next);
+   defun(n-ndefun.text, n-ndefun.body);
 success:
status = 0;
 setstatus:
@@ -730,7 +740,7 @@ evalcommand(union node *cmd, int flags)
*nargv = NULL;
 
lastarg = NULL;
-   if (iflag  funcnest == 0  argc  0)
+   if (iflag  funcline == 0  argc  0)
lastarg = nargv[-1];
 
preverrout.fd = 2;
@@ -928,8 +938,10 @@ evalfun(struct funcnode *func, int argc, char **argv, int 
flags)
struct jmploc *volatile savehandler;
struct jmploc jmploc;
int e;
+   int savefuncline;
 
saveparam = shellparam;
+   savefuncline = funcline;
if ((e = setjmp(jmploc.loc))) {
goto funcdone;
}
@@ -938,7 +950,7 @@ evalfun(struct funcnode *func, int argc, char **argv, int 
flags)
handler = jmploc;
shellparam.malloc = 0;
func-count++;
-   funcnest++;
+   funcline = func-n.ndefun.linno;
INTON;
shellparam.nparam = argc - 1;
shellparam.p = argv + 1;
@@ -949,7 +961,7 @@ evalfun(struct funcnode *func, int argc, char **argv, int 
flags)
poplocalvars(0);
 funcdone:
INTOFF;
-   funcnest--;
+   funcline = savefuncline;
freefunc(func);
freeparam(shellparam);
shellparam = saveparam;
@@ -1039,7 +1051,7 @@ returncmd(int argc, char **argv)
 * If called outside 

Re: [PATCH] Improved LINENO support

2010-11-11 Thread Harald van Dijk

On 10/11/10 22:59, Jilles Tjoelker wrote:

What I was thinking of was adding a lineno field to narg instead of to
all command types. The lineno variable would then be set by expand.c. I
think that leads to a smaller patch and it should still give a sensible
value for almost all errors.  Downside is a few more int-to-ascii
conversions. A few divisions is not that bad relative to other things
that are done but printf in modern libcs is bloated and slow and might
slow down things measurably.


I made an attempt at this, but managed to keep the printing in one place 
by making LINENO a special variable, which is stored in an int and set 
in expandarg. When $LINENO is evaluated, the code checks to see if the 
variable is LINENO, and if so, converts the value to a string. This is 
one simple pointer comparison in the common case in lookupvar:

  if (v == vlineno  v-text == linenovar)
and avoids sprintf in scripts that don't use LINENO.


FreeBSD's code subtracts the function's starting line number in the
parser rather than at run time. I wonder which is best. (FreeBSD
neglects to save and restore the value, so nested function definitions
may lead to incorrect line numbers, but that could be fixed.)


I used this approach in my updated patch, by resetting plinno to 0 in 
simplecmd, and restoring it after calling command().



I think this patch will increase dash's code size at least a little,
which is always a consideration here.


The code size varies little: the current limited LINENO support gives an 
executable size of 67680 bytes (default configure settings, no libedit, 
gcc 4.5.1, CFLAGS=-Os, strip after compilation), my previous patch gives 
67688 bytes, my latest patch gives 67704 bytes. I do not think 24 bytes 
should be a problem.



I can also look at getting this
into FreeBSD sh.


That would be great.

I will send the updated patch when I can properly test it. Basic testing 
shows it works, but more extensive testing fails. As it turns out, this 
is unrelated to my changes: current git, unmodified, has a problem:


$ cat test.sh
echo
cat _ACEOF
$0 $@
_ACEOF
echo
$ src/dash test.sh wtf

test.sh wtf
�
$

I had not noticed it with my earlier patch, because I actually did my 
testing with modified 0.5.6.1 sources.


Thanks for the comments.

Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improved LINENO support

2010-11-12 Thread Harald van Dijk

On 12/11/10 19:35, Eric Blake wrote:

On 11/12/2010 11:17 AM, Harald van Dijk wrote:

   (
 :
 :
 :
 :
 :
   )test-$LINENO

should write to file test-1, not test-7, even though the only word is on
line 7. bash gets this wrong, pdksh gets this almost right.


Umm - there was more than one word in that example (each : in the
subshell is a word).  But I agree that the subshell command itself
started on line 1, but the word containing $LINENO is on line 7.


That was because it was a poor example. I meant to use comments, but 
forgot to change this before posting. I haven't checked if they count as 
words in SUS, but they don't in the dash source code.



There are _so many_ variations on what this example would do that it's
hard to say that any one version is definitively correct.  In general,
the standard tends to favor the ksh93 interpretation of things; but on
your example, it touches the file test-6 (not test-7 nor test-1).



All I can see in the standard is:

Set by the shell to a decimal number representing the current
sequential line number (numbered starting with 1) within a script or
function before it executes each command.

with no hint as to whether LINENO auto-increments over the number of
newlines encountered while parsing that command.


You may be right. SUS doesn't really specify whether current refers to 
the first or the last line number of a command. test-1 and test-7 are 
equally defensible, so I was too quick to say that bash gets this wrong. 
I hope you agree, though, that test-6 is iffy, and for my other example:


  echo $LINENO \
$LINENO

two identical numbers must be printed. SUS only requires and only allows 
LINENO to be set before a command is executed, so once it is set, it is 
set for both expansions. Given that,


  (
#
#
#
#
#
  )test-$LINENO \
  test2-$LINENO

must not write to test-7 and test2-8. It may write to test-1 and 
test2-1, or it may write to test-8 and test2-8. Perhaps even to test-7 
and test2-7. Again, though, LINENO must be the same in both expansions. 
So for dash, the problem doesn't change: storing the line number of each 
word, rather than each command, makes it very hard to get LINENO right.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improved LINENO support

2010-11-12 Thread Harald van Dijk

On 12/11/10 20:08, Harald van Dijk wrote:

On 12/11/10 19:35, Eric Blake wrote:

On 11/12/2010 11:17 AM, Harald van Dijk wrote:

(
:
:
:
:
:
)test-$LINENO


Umm - there was more than one word in that example (each : in the
subshell is a word)


That was because it was a poor example. I meant to use comments,


...which don't work, because a command is required between ( and ). 
Never mind that part.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: static vs. dynamic scoping

2010-11-13 Thread Harald van Dijk

On 09/11/10 23:02, Eric Blake wrote:

2. User aspect:
   Is anyone aware of a script that intentionally uses the full power of
dynamic scoping available through 'local' which would break if scoping
switched to static?


FWIW, some scripts use local IFS to reset IFS to a sane value, and 
have the shell take care of restoring it afterwards. This works with 
ksh's typeset, but because of the static scoping only so long as no 
other shell functions get called, or those other shell functions also 
take the effort to handle IFS correctly. In the scripts I have been able 
to find, other shell functions do get called, but they themselves set 
IFS too. Still, it may be worth keeping in mind, if only to serve as one 
of the few examples of when dynamic scoping would have been slightly 
more useful.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: static vs. dynamic scoping

2010-11-13 Thread Harald van Dijk

On 14/11/10 00:54, ольга крыжановская wrote:

Now function a uses typeset IFS=X to set the field separator to X
via a local variable IFS. If function b now uses read foo1 foo2 foo3
to read a line of a database the concept of dynamic scoping *BITES*.


The way I had seen local IFS used is to sanitise IFS -- to reset IFS 
to the default value, when IFS was set to, for example, : before the 
function was called, and the function wants to split on whitespace. In 
that case, you don't want b to get IFS=:. Both forms are used, and in 
your form, I agree, static scoping is more useful.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quoted closing brace in variable default expansion

2010-11-20 Thread Harald van Dijk

On 19/11/10 14:53, Herbert Xu wrote:

Harald van Dijkhar...@gigawatt.nl  wrote:

   sh -c 'echo ${x:-\}}'


If you need to print anything involving a backspace you should
use printf and not echo.


It's true that backslashes should not be passed to echo, but assuming 
unset or empty x, ${x:-\}} should expand to }, and that is all echo 
should see.


Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PARSER] Remove backslash before } in double-quotes in variable

2010-11-23 Thread Harald van Dijk

On 21/11/10 14:41, Jilles Tjoelker wrote:

The backslash prevents the closing brace from terminating the
substitution, therefore it should be removed.


Thanks, that works great.


Example:
   printf %s\n ${$+\}} ${$+\}} ${$+\}}
should print } three times, without backslashes.


Some tests with different a few shells:

${$+'\}'}
 ${$+\}}
  ${$+\}}
   ${$+'\}'}
${$+\}}
 ${$+\}}

bash   | }  \} \} \} }  '\}'
pdksh  | }  \} \} }  \} \}
ksh| }  }  \} }  }  '}'
dash (current) | }  \} \} \} \} '\}'
dash (patched) | }  }  \} }  }  '}'

The one thing that I was a bit confused about was the difference between 
${$+'\}'} and ${$+\}}, but I do think that is correct: the backslash 
is used to see that the immediately following } is not a terminator, but 
the special rule allowing backslash before } to be removed in 
double-quoted strings in parameter expansions does not apply (there is 
no double-quoted string), so the backslash appears in the output as 
well. And since the last form uses a double-quoted string, the backslash 
gets removed there.


Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improved LINENO support

2010-11-27 Thread Harald van Dijk

On 12/11/10 22:29, Jilles Tjoelker wrote:

So some more ideas:

A per-command LINENO does not require adding the number to all of the
node types. Only node types that are commands that perform expansions
need it: NCMD, NREDIR, NBACKGND, NSUBSHELL, NFOR, NCASE.


This makes sense, and I've tried this with good results. I added NDEFUN 
to the list, to get the source to function line number mapping correct.


I noticed when wrapping half-parsed commands in an NREDIR, I got the 
line number wrong (not necessarily nonconforming, but different from 
what I had intended). I changed it in this patch by saving the line 
number in a variable before starting the parsing of the command.



NFOR and NCASE may be wrapped in an NREDIR node to hold the line number.
This simplifies the code a bit at the cost of some extra memory usage.


That's possible. I have not used that idea in this patch, but it should 
be an easy change.


Your other suggestions would work, but I am not able to say whether they 
form an improvement, so I did not use those for now.


Again, comments are welcome.
diff --git a/src/error.c b/src/error.c
index e304d3d..e60e430 100644
--- a/src/error.c
+++ b/src/error.c
@@ -62,6 +62,7 @@ struct jmploc *handler;
 int exception;
 int suppressint;
 volatile sig_atomic_t intpending;
+int errlinno;
 
 
 static void exverror(int, const char *, va_list)
@@ -116,13 +117,12 @@ exvwarning2(const char *msg, va_list ap)
const char *fmt;
 
errs = out2;
-   name = arg0 ?: sh;
-   fmt = %s: ;
-   if (commandname) {
-   name = commandname;
+   name = arg0 ? arg0 : sh;
+   if (!commandname)
fmt = %s: %d: ;
-   }
-   outfmt(errs, fmt, name, startlinno);
+   else
+   fmt = %s: %d: %s: ;
+   outfmt(errs, fmt, name, errlinno, commandname);
doformat(errs, msg, ap);
 #if FLUSHERR
outc('\n', errs);
diff --git a/src/error.h b/src/error.h
index 3162e15..08f96e9 100644
--- a/src/error.h
+++ b/src/error.h
@@ -122,6 +122,7 @@ void onint(void) __attribute__((__noreturn__));
 #else
 void onint(void);
 #endif
+extern int errlinno;
 void sh_error(const char *, ...) __attribute__((__noreturn__));
 void exerror(int, const char *, ...) __attribute__((__noreturn__));
 const char *errmsg(int, int);
diff --git a/src/eval.c b/src/eval.c
index b966749..d85f66f 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -73,7 +73,7 @@
 int evalskip;  /* set if we are skipping commands */
 STATIC int skipcount;  /* number of levels to skip */
 MKINIT int loopnest;   /* current loop nesting level */
-static int funcnest;   /* depth of function calls */
+STATIC int funcline;   /* starting line number of current function, or 
0 if not in a function */
 
 
 char *commandname;
@@ -218,6 +218,9 @@ evaltree(union node *n, int flags)
status = !exitstatus;
goto setstatus;
case NREDIR:
+   errlinno = lineno = n-nredir.linno;
+   if (funcline)
+   lineno -= funcline - 1;
expredir(n-nredir.redirect);
pushredir(n-nredir.redirect);
status = redirectsafe(n-nredir.redirect, REDIR_PUSH);
@@ -296,7 +299,7 @@ calleval:
}
goto success;
case NDEFUN:
-   defun(n-narg.text, n-narg.next);
+   defun(n);
 success:
status = 0;
 setstatus:
@@ -370,6 +373,10 @@ evalfor(union node *n, int flags)
struct strlist *sp;
struct stackmark smark;
 
+   errlinno = lineno = n-nfor.linno;
+   if (funcline)
+   lineno -= funcline - 1;
+
setstackmark(smark);
arglist.lastp = arglist.list;
for (argp = n-nfor.args ; argp ; argp = argp-narg.next) {
@@ -411,6 +418,10 @@ evalcase(union node *n, int flags)
struct arglist arglist;
struct stackmark smark;
 
+   errlinno = lineno = n-ncase.linno;
+   if (funcline)
+   lineno -= funcline - 1;
+
setstackmark(smark);
arglist.lastp = arglist.list;
expandarg(n-ncase.expr, arglist, EXP_TILDE);
@@ -442,6 +453,10 @@ evalsubshell(union node *n, int flags)
int backgnd = (n-type == NBACKGND);
int status;
 
+   errlinno = lineno = n-nredir.linno;
+   if (funcline)
+   lineno -= funcline - 1;
+
expredir(n-nredir.redirect);
if (!backgnd  flags  EV_EXIT  !have_traps())
goto nofork;
@@ -697,6 +712,10 @@ evalcommand(union node *cmd, int flags)
int status;
char **nargv;
 
+   errlinno = lineno = cmd-ncmd.linno;
+   if (funcline)
+   lineno -= funcline - 1;
+
/* First expand the arguments. */
TRACE((evalcommand(0x%lx, %d) called\n, (long)cmd, flags));
setstackmark(smark);
@@ -730,7 +749,7 @@ evalcommand(union node *cmd, int flags)
*nargv = NULL;
 
lastarg = 

Re: [PATCH 1/2] [SHELL] Allow building without LINEO support.

2011-08-17 Thread Harald van Dijk
On Tue, 2011-08-16 at 20:12 -0500, Jonathan Nieder wrote:
 David Miller wrote:
 
  [Subject: [SHELL] Allow building without LINEO support.]
 
 Thanks!  Debian has been using something like this (but unconditional)
 to convince autoconf not to use dash as CONFIG_SHELL, to work around
 bugs in various configure scripts[1].  I imagine other users might
 want the same thing, so a patch like this seems like a good idea.

If you don't mind me asking, if you want configure scripts to run from
bash, why not simply run configure scripts from bash, instead of running
them from sh and trusting that sh will call bash if it is really some
other shell? Don't get me wrong, there are other reasons why making
LINENO support optional is a good idea (smaller executable at the low
cost of a rarely used feature), so don't take this as an argument
against this patch, I'm just curious about this one point.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash on valid input

2013-04-09 Thread Harald van Dijk

On 09/04/13 05:27, Eric Blake wrote:

On 04/08/2013 09:12 PM, Dan Kegel wrote:

Yes, my script was crap, I've fixed it.

Here's the reproducer.  Called with foo unset.  I think it doesn't
crash without -x.

#!/bin/dash
set -x
test ! $foo

The 'set -x' was indeed the key to reproducing the problem.  In fact,
this is the shortest I could make it:

dash -cx 'test !'

It is not limited to 'set -x'. dash continues reading after the NULL 
value in argv, and usually that will be followed by another NULL if 'set 
-x' is not used, but not necessarily.


$ dash -c 'test ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !'
$ dash -c 'test ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !'

Segmentation fault
$ dash -c 'test ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !'

$

When [ ! ] is used, the ! is necessarily followed by two NULLs (one 
after the ], and one because the ] is replaced by NULL), so the problem 
is hidden.


dash should check whether ! is followed by an argument, like bash does, 
which would give an error message without a segmentation fault for all 
three forms above.


This seems to be easily possible by manually inlining primary() into 
nexpr(), and treating UNOT similarly to STREZ e.a.:


diff --git a/src/bltin/test.c b/src/bltin/test.c
index 90135e1..5cf4021 100644
--- a/src/bltin/test.c
+++ b/src/bltin/test.c
@@ -268,14 +268,6 @@ aexpr(enum token n)
 static int
 nexpr(enum token n)
 {
-   if (n == UNOT)
-   return !nexpr(t_lex(++t_wp));
-   return primary(n);
-}
-
-static int
-primary(enum token n)
-{
enum token nn;
int res;

@@ -289,11 +281,13 @@ primary(enum token n)
syntax(NULL, closing paren expected);
return res;
}
-   if (t_wp_op  t_wp_op-op_type == UNOP) {
+   if (t_wp_op  (t_wp_op-op_type == UNOP || t_wp_op-op_type == 
BUNOP)) {

/* unary expression */
if (*++t_wp == NULL)
syntax(t_wp_op-op_text, argument expected);
switch (n) {
+   case UNOT:
+   return !nexpr(t_lex(t_wp));
case STREZ:
return strlen(*t_wp) == 0;
case STRNZ:

Unfortunately, this exposes the fact that POSIX test requires special 
behaviour when called with fewer than five arguments. This change would 
cause test ! to start returning an error. Something like


diff --git a/src/bltin/test.c b/src/bltin/test.c
index 5cf4021..1e84423 100644
--- a/src/bltin/test.c
+++ b/src/bltin/test.c
@@ -177,6 +177,7 @@ testcmd(int argc, char **argv)
 {
const struct t_op *op;
enum token n;
+   int not;
int res;

if (*argv[0] == '[') {
@@ -191,25 +192,44 @@ testcmd(int argc, char **argv)
if (argc  1)
return 1;

+   not = 0;
+
/*
 * POSIX prescriptions: he who wrote this deserves the Nobel
 * peace prize.
 */
-   switch (argc) {
-   case 3:
-   op = getop(argv[1]);
-   if (op  op-op_type == BINOP) {
-   n = OPERAND;
-   goto eval;
+   for (;;) {
+   switch (argc) {
+   case 1:
+   res = strlen(argv[0]) == 0;
+   goto exit;
+
+   case 3:
+   op = getop(argv[1]);
+   if (op  op-op_type == BINOP) {
+   n = OPERAND;
+   goto eval;
+   }
+   /* fall through */
+
+   case 2:
+   case 4:
+   if (!strcmp(argv[0], !)) {
+   not = !not;
+   argv++;
+   argc--;
+   continue;
+   }
+
+   if (!strcmp(argv[0], ()  !strcmp(argv[argc - 
1], ))) {

+   argv[--argc] = NULL;
+   argv++;
+   argc--;
+   continue;
+   }
}
-   /* fall through */

-   case 4:
-   if (!strcmp(argv[0], ()  !strcmp(argv[argc - 1], ))) {
-   argv[--argc] = NULL;
-   argv++;
-   argc--;
-   }
+   break;
}

n = t_lex(argv);
@@ -222,6 +242,10 @@ eval:
if (argv[0] != NULL  argv[1] != NULL)
syntax(argv[0], unexpected operator);

+exit:
+   if (not)
+   res = !res;
+
return res;
 }


Although this is a bit ugly, it gets the right 

Re: command -p does not correctly limit search to a safe PATH

2013-07-14 Thread Harald van Dijk
On 10/07/13 20:18, Craig Loomis wrote:
   Dash (0.5.7 and git master) does not implement 'command -p'
 according to the standard, and opens an intriguing security hole to
 anyone trying this scheme.
 
   When using 'command -v' to simply print the path to an executable,
 '-p' has no effect:

You're right. dash has never supported combining -p with -v, but back in
2005 this was seemingly accidentally changed from reporting a syntax
error to silently ignoring the -p option, only about a month after dash
moved to git.

Making sure that -p is respected even when -v is used is easy enough,
see attached patch. Tested even with explicit PATH overrides:
  PATH=/path/to/some/other/dash command -pv dash
correctly outputs /bin/dash on my system.

 the path that 'command -p cmd' uses is a compiled-in constant
 from dash's src/var.c:defpathvar, which starts with
 /usr/local/sbin:/usr/local/bin. To me, that is both completely
 unexpected and pretty scary -- /usr/local/bin is (very) often less
 well secured or checked than, say, /bin:

Agreed. However, IMO, it does make sense for defpathvar to start with
/usr/local/*: it has two separate functions, it also serves as the
default path (hence the name) when dash is started with no PATH set at
all. I think fixing this should be done in a way so that command -p does
not use defpathvar, not by changing defpathvar. bash uses the same
confstr function for this that getconf uses, and it shouldn't be too
much work to make dash use that too. If no one else comes up with a
working patch or a better approach, I'll try to get that working.

Cheers,
Harald
commit 475e328589fd2e843c138d49fb96699a2a66151d
Author: Harald van Dijk har...@gigawatt.nl
Date:   Sun Jul 14 21:23:01 2013 +0200

command: allow combining -p with -v

diff --git a/src/exec.c b/src/exec.c
index 79e2007..e56e3f6 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -96,7 +96,7 @@ STATIC void clearcmdentry(int);
 STATIC struct tblentry *cmdlookup(const char *, int);
 STATIC void delete_cmd_entry(void);
 STATIC void addcmdentry(char *, struct cmdentry *);
-STATIC int describe_command(struct output *, char *, int);
+STATIC int describe_command(struct output *, char *, const char *, int);
 
 
 /*
@@ -727,21 +727,21 @@ typecmd(int argc, char **argv)
 	int err = 0;
 
 	for (i = 1; i  argc; i++) {
-		err |= describe_command(out1, argv[i], 1);
+		err |= describe_command(out1, argv[i], pathval(), 1);
 	}
 	return err;
 }
 
 STATIC int
-describe_command(out, command, verbose)
+describe_command(out, command, path, verbose)
 	struct output *out;
 	char *command;
+	const char *path;
 	int verbose;
 {
 	struct cmdentry entry;
 	struct tblentry *cmdp;
 	const struct alias *ap;
-	const char *path = pathval();
 
 	if (verbose) {
 		outstr(command, out);
@@ -840,20 +840,23 @@ commandcmd(argc, argv)
 		VERIFY_BRIEF = 1,
 		VERIFY_VERBOSE = 2,
 	} verify = 0;
+	const char *path = pathval();
 
 	while ((c = nextopt(pvV)) != '\0')
 		if (c == 'V')
 			verify |= VERIFY_VERBOSE;
 		else if (c == 'v')
 			verify |= VERIFY_BRIEF;
+		else if (c == 'p')
+			path = defpath;
 #ifdef DEBUG
-		else if (c != 'p')
+		else
 			abort();
 #endif
 
 	cmd = *argptr;
 	if (verify  cmd)
-		return describe_command(out1, cmd, verify - VERIFY_BRIEF);
+		return describe_command(out1, cmd, path, verify - VERIFY_BRIEF);
 
 	return 0;
 }


Re: command -p does not correctly limit search to a safe PATH

2013-07-19 Thread Harald van Dijk
On 14/07/13 21:54, Harald van Dijk wrote:
 On 10/07/13 20:18, Craig Loomis wrote:
   Dash (0.5.7 and git master) does not implement 'command -p'
 according to the standard, and opens an intriguing security hole to
 anyone trying this scheme.
[...]
 the path that 'command -p cmd' uses is a compiled-in constant
 from dash's src/var.c:defpathvar, which starts with
 /usr/local/sbin:/usr/local/bin.

So, how about this, to be applied on top of my previous patch? It
defaults to using confstr() if available and reporting a hard error at
run time if that fails, but it can be configured to not use confstr(),
and/or fall back to a path specified at configuration time:

Use confstr() only, fail to configure if confstr is unavailable, fail at
runtime if confstr() does not return any path:
  configure --enable-confstr

Use confstr(), fail to configure if confstr is unavailable, fall back to
/usr/bin:/bin if confstr() does not return any path:
  configure --enable-confstr \
--with-fallback-standard-path='/usr/bin:/bin'

Use confstr() if available, use /usr/bin:/bin if confstr() is
unavailable or does not return any path:
  configure \
--with-fallback-standard-path='/usr/bin:/bin'

Do no use confstr(), always use /usr/bin:/bin:
  configure --disable-confstr \
--with-fallback-standard-path='/usr/bin:/bin'

Feedback welcome, there are several parts of this that I am not sure
about. Also, defpathvar is no longer used outside of var.c, so I made it
static and removed the references in var.h.

Cheers,
Harald
commit 008118857bdb34ea885d76cbddbb56290706dcd2
Author: Harald van Dijk har...@gigawatt.nl
Date:   Fri Jul 19 23:29:55 2013 +0200

command: use confstr() for -p to get a safe path

diff --git a/configure.ac b/configure.ac
index c6fb401..e595e99 100644
--- a/configure.ac
+++ b/configure.ac
@@ -40,6 +40,36 @@ AC_ARG_ENABLE(fnmatch, AS_HELP_STRING(--enable-fnmatch, \
   [Use fnmatch(3) from libc]))
 AC_ARG_ENABLE(glob, AS_HELP_STRING(--enable-glob, [Use glob(3) from libc]))
 
+AC_ARG_ENABLE([confstr],
+[AS_HELP_STRING([--enable-confstr], [Use confstr(3) from libc (default=auto)])],,
+[enable_confstr=auto])
+
+if test x$enable_confstr != xno; then
+	have_confstr=no
+	AC_CHECK_FUNCS([confstr],[have_confstr=yes])
+	have_decl_confstr=no
+	AC_CHECK_DECL([confstr],[have_decl_confstr=yes
+	AC_DEFINE([HAVE_DECL_CONFSTR], [1], [Define as 1 if you have a declaration of confstr()])])
+	have_decl_cs_path=no
+	AC_CHECK_DECL([_CS_PATH],[have_decl_cs_path=yes
+	AC_DEFINE([HAVE_DECL_CS_PATH], [1], [Define as 1 if you have a declaration of _CS_PATH])])
+
+	if test $have_confstr:$have_decl_confstr:$have_decl_cs_path != yes:yes:yes; then
+		if test x$enable_confstr = xyes; then
+			AC_MSG_ERROR([cannot use confstr])
+		fi
+	fi
+fi
+
+AC_ARG_WITH([fallback-standard-path],
+[AS_HELP_STRING([--with-fallback-standard-path=\ARG\],
+[use ARG for command -p path lookups if confstr does not return anything (default: none)])],,
+[with_fallback_standard_path=no])
+
+if test x$with_fallback_standard_path != xno; then
+	AC_DEFINE_UNQUOTED([FALLBACK_STANDARD_PATH], [$with_fallback_standard_path], [Define to the path to use for command -p path lookups (ignored if confstr() is used)])
+fi
+
 dnl Checks for libraries.
 
 dnl Checks for header files.
diff --git a/src/eval.c b/src/eval.c
index c7358a6..1dba165 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -644,7 +644,7 @@ parse_command_args(char **argv, const char **path)
 		do {
 			switch (c) {
 			case 'p':
-*path = defpath;
+*path = stdpath();
 break;
 			default:
 /* run 'typecmd' for other options */
diff --git a/src/exec.c b/src/exec.c
index e56e3f6..e32d48e 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -848,7 +848,7 @@ commandcmd(argc, argv)
 		else if (c == 'v')
 			verify |= VERIFY_BRIEF;
 		else if (c == 'p')
-			path = defpath;
+			path = stdpath();
 #ifdef DEBUG
 		else
 			abort();
@@ -860,3 +860,32 @@ commandcmd(argc, argv)
 
 	return 0;
 }
+
+const char *
+stdpath(void)
+{
+	static const char *path = NULL;
+
+#if HAVE_CONFSTR  HAVE_DECL_CONFSTR  HAVE_DECL_CS_PATH
+	if (path == NULL) {
+		size_t size = confstr(_CS_PATH, NULL, 0);
+		if (size != 0) {
+			char *newpath = ckmalloc(size);
+			confstr(_CS_PATH, newpath, size);
+			path = newpath;
+		}
+	}
+#endif
+
+#ifdef FALLBACK_STANDARD_PATH
+	if (path == NULL) {
+		path = FALLBACK_STANDARD_PATH;
+	}
+#endif
+
+	if (path == NULL) {
+		exerror(EXEXIT, %s, no path for standard utilities);
+	}
+
+	return path;
+}
diff --git a/src/exec.h b/src/exec.h
index 9ccb305..a3f9314 100644
--- a/src/exec.h
+++ b/src/exec.h
@@ -75,3 +75,4 @@ void defun(union node *);
 void unsetfunc(const char *);
 int typecmd(int, char **);
 int commandcmd(int, char **);
+const char *stdpath(void);
diff --git a/src/var.c b/src/var.c
index dc90249..653ba5b 100644
--- a/src/var.c
+++ b/src/var.c
@@ -73,7 +73,7 @@ struct localvar_list {
 
 MKINIT struct localvar_list *localvar_stack;
 
-const char defpathvar[] =
+static const char defpathvar

Re: [PATCH] implement privmode support in dash

2013-08-22 Thread Harald van Dijk
On 22/08/13 19:59, Tavis Ormandy wrote:
 Hello, this is a patch to add privmode support to dash. privmode attempts to
 drop privileges by default if the effective uid does not match the uid. This
 can be disabled with -p, or -o nopriv.

Hi Tavis,

Your approach definitely has my support (FWTW), but there are two
aspects that surprised me, and are different from bash and FreeBSD's sh:

You named the option nopriv, while bash and FBSD use the name
privileged. I think it is likely to confuse people if bash -o
privileged and dash -o nopriv do the same thing, and that it would be
better to match bash and give the option a positive name, such as
priv, or perhaps even match them exactly and use privileged.

In bash and FBSD, after starting with -p, set +p can be used to drop
privileges. With your patch, dash accepts set +p, but silently ignores it.

How does something like the attached, to be applied on top of your
patch, look?

Cheers,
Harald
diff --git a/src/Makefile.am b/src/Makefile.am
index 2a37381..82d00fb 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -21,14 +21,14 @@ bin_PROGRAMS = dash
 dash_CFILES = \
 	alias.c arith_yacc.c arith_yylex.c cd.c error.c eval.c exec.c expand.c \
 	histedit.c input.c jobs.c mail.c main.c memalloc.c miscbltin.c \
-	mystring.c options.c parser.c redir.c show.c trap.c output.c \
+	mystring.c options.c parser.c priv.c redir.c show.c trap.c output.c \
 	bltin/printf.c system.c bltin/test.c bltin/times.c var.c
 dash_SOURCES = \
 	$(dash_CFILES) \
 	alias.h arith_yacc.h bltin/bltin.h cd.h error.h eval.h exec.h \
 	expand.h hetio.h \
 	init.h input.h jobs.h machdep.h mail.h main.h memalloc.h miscbltin.h \
-	myhistedit.h mystring.h options.h output.h parser.h redir.h shell.h \
+	myhistedit.h mystring.h options.h output.h parser.h priv.h redir.h shell.h \
 	show.h system.h trap.h var.h
 dash_LDADD = builtins.o init.o nodes.o signames.o syntax.o
 
diff --git a/src/dash.1 b/src/dash.1
index 94fc642..0f35748 100644
--- a/src/dash.1
+++ b/src/dash.1
@@ -257,7 +257,7 @@ if it has been set).
 .It Fl b Em notify
 Enable asynchronous notification of background job completion.
 (UNIMPLEMENTED for 4.4alpha)
-.It Fl p Em nopriv
+.It Fl p Em priv
 Do not attempt to reset effective uid if it does not match uid. This is not set
 by default to help avoid incorrect usage by setuid root programs via system(3) or
 popen(3).
diff --git a/src/main.c b/src/main.c
index e106848..69e5e07 100644
--- a/src/main.c
+++ b/src/main.c
@@ -50,6 +50,7 @@
 #include eval.h
 #include jobs.h
 #include input.h
+#include priv.h
 #include trap.h
 #include var.h
 #include show.h
@@ -97,8 +98,6 @@ main(int argc, char **argv)
 	struct jmploc jmploc;
 	struct stackmark smark;
 	int login;
-	uid_t uid;
-	gid_t gid;
 
 #ifdef __GLIBC__
 	dash_errno = __errno_location();
@@ -154,17 +153,6 @@ main(int argc, char **argv)
 	setstackmark(smark);
 	login = procargs(argc, argv);
 
-	/*
-	 * To limit bogus system(3) or popen(3) calls in setuid binaries, require
-	 * -p flag to work in this situation.
-	 */
-	if (!pflag  (uid != geteuid() || gid != getegid())) {
-		setuid(uid);
-		setgid(gid);
-		/* PS1 might need to be changed accordingly. */
-		choose_ps1();
-	}
-
 	if (login) {
 		state = 1;
 		read_profile(/etc/profile);
diff --git a/src/options.c b/src/options.c
index 99ac55b..c640a41 100644
--- a/src/options.c
+++ b/src/options.c
@@ -45,6 +45,7 @@
 #include jobs.h
 #include input.h
 #include output.h
+#include priv.h
 #include trap.h
 #include var.h
 #include memalloc.h
@@ -79,7 +80,7 @@ static const char *const optnames[NOPTS] = {
 	allexport,
 	notify,
 	nounset,
-	nopriv,
+	priv,
 	nolog,
 	debug,
 };
@@ -184,6 +185,7 @@ optschanged(void)
 #ifdef DEBUG
 	opentrace();
 #endif
+	setprivileged(pflag);
 	setinteractive(iflag);
 #ifndef SMALL
 	histedit();
diff --git a/src/priv.c b/src/priv.c
new file mode 100644
index 000..26346c0
--- /dev/null
+++ b/src/priv.c
@@ -0,0 +1,27 @@
+#include unistd.h
+
+#include priv.h
+#include var.h
+
+uid_t uid;
+gid_t gid;
+
+void setprivileged(int on)
+{
+	static int is_privileged = 1;
+	if (is_privileged == on)
+		return;
+
+	is_privileged = on;
+
+	/*
+	 * To limit bogus system(3) or popen(3) calls in setuid binaries, require
+	 * -p flag to work in this situation.
+	 */
+	if (!on  (uid != geteuid() || gid != getegid())) {
+		setuid(uid);
+		setgid(gid);
+		/* PS1 might need to be changed accordingly. */
+		choose_ps1();
+	}
+}
diff --git a/src/priv.h b/src/priv.h
new file mode 100644
index 000..31b380b
--- /dev/null
+++ b/src/priv.h
@@ -0,0 +1,6 @@
+#include unistd.h
+
+extern uid_t uid;
+extern gid_t gid;
+
+void setprivileged(int on);
diff --git a/src/var.c b/src/var.c
index cafe407..27f 100644
--- a/src/var.c
+++ b/src/var.c
@@ -192,13 +192,12 @@ initvar(void)
 void
 choose_ps1(void)
 {
-	if (vps1.flags  VEXPORT)
-		return;
-
 	if (!geteuid()) {
-		vps1.text = rootps1;
+		if (vps1.text == defps1)
+			vps1.text = rootps1;
 	} else {
-		vps1.text = defps1;
+		if (vps1.text 

test incorrectly rejecting valid expression with confusing ! placement

2013-08-24 Thread Harald van Dijk
Hi,

Now that Herbert fixed the reported crash in test (in a far simpler
manner than I had suggested, which I like), I did some more testing, and
came across one case that does not currently work, and did not work in
the past, but is perfectly valid:

$ src/dash -c 'test ! ! = !'
src/dash: 1: test: =: unexpected operator

POSIX requires special behaviour for four-argument tests:

4 arguments:
  If $1 is '!', negate the three-argument test of $2, $3, and $4.
  [...]

so this is supposed to evaluate as the negation of test ! = !. That test
does work properly in dash.

There are also some cases where test gives incorrect results when
combining ! with -o, but ( ), -a and -o have been obsoleted in favour of
the built-in shell operators ( ),  and ||, so I am not sure if that is
worth fixing. Details anyway:

$ src/dash -c 'test ! -o !'
src/dash: 1: test: -o: unexpected operator

This is covered by the special rule for three arguments, where the
second argument is a binary operator. -o is a boolean binary operator,
but a binary operator nonetheless, so this should test whether the two !
arguments are empty strings.

$ src/dash -c 'test !  -o !'; echo $?
0

This is covered by the special rule for four arguments, there ! as the
first argument evaluates the remaining three-argument test and negates
the result. In this special case, ! does not have higher precedence than
-o, so the correct exit status is 1, which bash's test gives.

Cheers,
Harald
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] \e in echo and printf builtins

2014-06-29 Thread Harald van Dijk
On 28/06/14 19:33, Paul Gilmartin wrote:
 On 2014-06-28, at 10:52, Harald van Dijk wrote:
 No comment on whether dash itself should accept \e, but you already
 found a compiler that doesn't support it at all, and many of the ones
 that do support it also (optionally) issue a warning for it. Should the
 C code perhaps be using \033 instead of \e?

 And here I put on my EBCDIC hat (crown of thorns) and say, Be careful
 using numeric code points; they're not portable.

The way GCC does this is to hard-code the values for ASCII, and to
hard-code the values for EBCDIC, both hidden behind preprocessor macros.
Taking that approach in dash would look something like

#if defined(ASCII)
#define ESCAPE '\033'
#elif defined(EBCDIC)
#define ESCAPE '\047'
#else
#error Unknown character set
#endif

and then use ESCAPE.

ASCII and EBCDIC would be detected (or specified) by the configure script.

But dash does have an assumption of ASCII already, which is why I didn't
think it would hurt to add one more. At the very least, it uses
hard-coded \033s in src/hetio.c, so that part of the code already cannot
be used on EBCDIC systems.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [BUILTIN] Require leading '0' on octal escapes in echo

2014-11-01 Thread Harald van Dijk

On 11/1/2014 6:14 PM, John Keeping wrote:

printf(1) supports octal escape sequences in its format argument which
are specified as (from POSIX):

\ddd, where ddd is a one, two, or three-digit octal number

But the argument to the %b format specifier allows:

\0ddd, where ddd is a zero, one, two, or three-digit octal
number

which is similar to the wording for echo(1) (for XSI-conformant
systems):

\0num   Write an 8-bit value that is the zero, one, two, or
three-digit octal number num.

Because conv_escape() handles the first case, applying the second
behaviour in conv_escape_str() must also catch the characters '1'-'7' so
that they are not converted as octal numbers.


Your patch seems to have addressed the clear bugs of the patch in that 
other thread. Let me attempt to summarise the status:


- POSIX does not specify the behaviour of \1 in echo and in printf %b.

POSIX does not define the behaviour of escape sequences other than the 
ones it explicitly specifies. It does not require \1 to be handled as 
\\1. It allows it, but it allows the current dash behaviour too.


To quote from the echo specification: if any of the operands contain a 
backslash ( '\' ) character, the results are implementation-defined, 
and the bit about XSI doesn't include an exception for \1.


To quote from the printf %b specification: The interpretation of a 
backslash followed by any other sequence of characters is unspecified.


- bash treats \1 as \\1 in echo, but as \01 in printf %b.

- dash treats \1 as \01 in both echo and in printf %b.

- Your patch makes dash treat \1 as \01 in both echo and printf %b.

- The aim of the patch in the other thread was to make dash be more like 
bash.


If that is your aim too, if you want dash to behave like bash, in order 
to achieve that the code must no longer be shared between echo and 
printf %b. Here is a simple test you can run, where dash is without your 
patch, and ./src/dash is with your patch:


$ bash -c 'printf %b \1' | cat -v
^A
$ dash -c 'printf %b \1' | cat -v
^A
$ ./src/dash -c 'printf %b \1' | cat -v
\1

If that isn't your aim, if your aim is only to make dash meet POSIX 
requirements, then don't worry, it already does so.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash.1 - Confusion between two pages c[h]sh

2014-11-10 Thread Harald van Dijk

On 11/10/2014 2:27 PM, Herbert Xu wrote:

Stéphane Aulery saul...@free.fr wrote:

[-- text/plain, encoding 8bit, charset: utf-8, 12 lines --]

Hello,

Here is a small patch reported by a user of Debian [1]. Could you please
integrate? Thank you for your help.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=646847


This patch also appears bogus as csh is an alternative to dash.


Even if csh is intentionally in the See also section despite not being 
referenced anywhere else, is it also intentional that chsh is not in 
that section despite the earlier reference? Would a patch that leaves 
csh but also adds chsh be more appropriate, or do you prefer to leave it 
as it is now?


Cheers,
Harald van Dijk


Cheers,


--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash-0.5.8 bug report

2014-11-21 Thread Harald van Dijk

On 21/11/2014 14:55, David Binderman wrote:

Hello there,

eval.c:273:19: warning: logical not is only applied to the left hand side of 
comparison [-Wlogical-not-parentheses]

 if (!exitstatus == isor)

Maybe

 if (exitstatus != isor)

was intended.


This looks correct to me. isor represents the type of command 
composition. It can be 0 (), 1 (||) or 2 (;). If isor == 0, the second 
command should not be executed if exitstatus != 0. If isor == 1, the 
second command should not be executed if exitstatus == 0. If isor == 2, 
the second command should be executed.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possibly wrong handling of $_?

2014-12-23 Thread Harald van Dijk

On 23/12/2014 23:34, Vadim Zeitlin wrote:

  Hello,

  I'm not exactly sure if this is a bug because I didn't find any
specification about how is this supposed to behave (to my surprise it
turned out that $_ was not in POSIX), but please consider this:

% zsh -c 'echo -n foo  echo $_'
foofoo
% bash -c 'echo -n foo  echo $_'
foofoo
% dash -c 'echo -n foo  echo $_'
foo/usr/bin/dash


This does come across as somewhat confusing, but $_ is really not a 
special variable at all in dash. The shell you're using to launch dash 
does make it a special variable. That shell puts _ in the environment, 
which dash then picks up, and ignores, other than making it available as $_.


You can see what your usual shell is doing by testing with other 
commands: just run


  env | grep '^_='

and you'll probably see

  _=/usr/bin/env

It works the same way when starting dash.


I've tested several different versions of zsh (4 and 5) and bash (3 and 4)
and dash 0.5.5 and 0.5.7 from Debian and they all produced the results as
above.

  Shouldn't dash follow the other shells here?
VZ


If dash did something special with $_, then I agree it would be nice if 
it would be somewhat compatible with other shells. If dash simply does 
not implement a feature, that feature is not required by any standard, 
and that feature is not widely used, then I suspect there won't be a lot 
of interest in implementing that feature.


Don't be put off by that, though. You are free, of course, if you feel 
so, to attempt to convince people $_ is an important feature that all 
shells should implement. If you have compelling use cases, if it solves 
real problems, and if many other shells already implement it, you might 
even get it standardised. I have never seen a need for it, but that's 
just me speaking from personal experience, others may feel differently.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possibly wrong handling of $_?

2014-12-25 Thread Harald van Dijk

On 25/12/2014 15:32, Jilles Tjoelker wrote:

On Wed, Dec 24, 2014 at 12:33:32AM +0100, Vadim Zeitlin wrote:

On Wed, 24 Dec 2014 00:21:09 +0100 Harald van Dijk har...@gigawatt.nl wrote:
HvD On 23/12/2014 23:34, Vadim Zeitlin wrote:
HvDHello,
HvD 
HvDI'm not exactly sure if this is a bug because I didn't find any
HvD  specification about how is this supposed to behave (to my surprise it
HvD  turned out that $_ was not in POSIX), but please consider this:
HvD 
HvD % zsh -c 'echo -n foo  echo $_'
HvD foofoo
HvD % bash -c 'echo -n foo  echo $_'
HvD foofoo
HvD % dash -c 'echo -n foo  echo $_'
HvD foo/usr/bin/dash
HvD
HvD This does come across as somewhat confusing, but $_ is really not a
HvD special variable at all in dash.



  Ah, this does explain it, thanks!


Dash does implement $_, but only in interactive mode.


Ah, apologies. In interactive mode, it indeed implements it, but 
differently from bash. In dash, only commands that specify a command 
name cause $_ to be set. For example,


  dash -ic 'if a=b; then c=d; else e=f; fi; echo $_'

still prints

  /usr/bin/dash

(or wherever dash is installed), because if statements, and variable 
assignments, have no effect on $_. In bash, this prints a blank line, 
because the prior value of $_ has not survived those variable assignments.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getopts doesn't properly update OPTIND when called from function

2015-05-28 Thread Harald van Dijk

On 28/05/2015 20:54, Martijn Dekker wrote:

I'm writing a shell function that extends the functionality of the
'getopts' builtin. For that to work, it is necessary to call the
'getopts' builtin from the shell function.

The POSIX standard specifies that OPTIND and OPTARG are global
variables, even though the positional parameters are local to the
function.[*] This makes it possible to call 'getopts' from a function by
simply passing the global positional parameters along by adding $@.

My problem is that dash does not properly update the global OPTIND
variable when getopts is called from a function, which defeats my
function on dash. It updates the global OPTIND for the first option but
not for subsequent options, so OPTIND gets stuck on 3. (It does
accurately update the global OPTARG variable, though.)


That isn't the problem, not exactly anyway. The problem is that getopts 
is required to keep internal state separately from the OPTIND variable 
(a single integer is insufficient to track the progress when multiple 
options are combined in a single word), and that internal state is 
stored along with the positional parameters. The positional parameters 
are saved just before a function call, and restored when the function 
returns. The internal state of getopts should not be saved the same way. 
It should probably just be global to dash.


A quick patch to make sure it is global, and isn't reset when it 
shouldn't or doesn't need to be, is attached. You can experiment with 
it, if you like. Your script runs as expected with this patch. However, 
I should warn that I have done far too little investigation to actually 
submit this patch for inclusion at this point. I'll do more extensive 
checking, and testing, later. If someone else can check whether the 
patch is okay, and if not, in what cases it fails, that would be very 
welcome too.


Note that any changes could break existing scripts, that (incorrectly) 
rely on dash's current behaviour of implicitly resetting the state, and 
don't bother setting OPTIND to explicitly reset it when they want to 
parse a new set of arguments.


Cheers,
Harald van Dijk
diff --git a/src/eval.c b/src/eval.c
index 071fb1b..59e7506 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -953,8 +953,6 @@ evalfun(struct funcnode *func, int argc, char **argv, int 
flags)
INTON;
shellparam.nparam = argc - 1;
shellparam.p = argv + 1;
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
pushlocalvars();
evaltree(func-n.ndefun.body, flags  EV_TESTED);
poplocalvars(0);
diff --git a/src/options.c b/src/options.c
index 6f381e6..5b24eeb 100644
--- a/src/options.c
+++ b/src/options.c
@@ -163,8 +163,8 @@ setarg0:
}
 
shellparam.p = xargv;
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
+   shoptind = 1;
+   shoptoff = -1;
/* assert(shellparam.malloc == 0  shellparam.nparam == 0); */
while (*xargv) {
shellparam.nparam++;
@@ -316,8 +316,6 @@ setparam(char **argv)
shellparam.malloc = 1;
shellparam.nparam = nparam;
shellparam.p = newparam;
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
 }
 
 
@@ -362,8 +360,6 @@ shiftcmd(int argc, char **argv)
}
ap2 = shellparam.p;
while ((*ap2++ = *ap1++) != NULL);
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
INTON;
return 0;
 }
@@ -394,8 +390,8 @@ void
 getoptsreset(value)
const char *value;
 {
-   shellparam.optind = number(value) ?: 1;
-   shellparam.optoff = -1;
+   shoptind = number(value) ?: 1;
+   shoptoff = -1;
 }
 
 /*
@@ -412,20 +408,10 @@ getoptscmd(int argc, char **argv)
 
if (argc  3)
sh_error(Usage: getopts optstring var [arg]);
-   else if (argc == 3) {
+   else if (argc == 3)
optbase = shellparam.p;
-   if ((unsigned)shellparam.optind  shellparam.nparam + 1) {
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
-   }
-   }
-   else {
+   else
optbase = argv[3];
-   if ((unsigned)shellparam.optind  argc - 2) {
-   shellparam.optind = 1;
-   shellparam.optoff = -1;
-   }
-   }
 
return getopts(argv[1], argv[2], optbase);
 }
@@ -438,10 +424,10 @@ getopts(char *optstr, char *optvar, char **optfirst)
int done = 0;
char s[2];
char **optnext;
-   int ind = shellparam.optind;
-   int off = shellparam.optoff;
+   int ind = shoptind;
+   int off = shoptoff;
 
-   shellparam.optind = -1;
+   shoptind = -1;
optnext = optfirst + ind - 1;
 
if (ind = 1 || off  0 || strlen(optnext[-1])  off)
@@ -509,8 +495,8 @@ out:
s[1] = '\0';
setvar(optvar, s, 0);
 
-   shellparam.optoff = p ? p - *(optnext - 1) : -1

Re: getopts doesn't properly update OPTIND when called from function

2015-05-28 Thread Harald van Dijk

On 29/05/2015 04:58, Herbert Xu wrote:

Harald van Dijk har...@gigawatt.nl wrote:

That isn't the problem, not exactly anyway. The problem is that getopts
is required to keep internal state separately from the OPTIND variable
(a single integer is insufficient to track the progress when multiple
options are combined in a single word), and that internal state is
stored along with the positional parameters. The positional parameters
are saved just before a function call, and restored when the function
returns. The internal state of getopts should not be saved the same way.
It should probably just be global to dash.


I think the current behaviour is fine as far as POSIX is concerned.
It says:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html

: APPLICATION USAGE

...

: Note that shell functions share OPTIND with the calling shell
: even though the positional parameters are changed. If the calling
: shell and any of its functions uses getopts to parse arguments,
: the results are unspecified.


The Application usage sections are informative and aren't worded as 
precisely as the other sections. If a script uses getopts at the global 
level, and it calls a shell function that too uses getopts, then it is 
very easy to be covered by


 Any other attempt to invoke getopts multiple times in a single shell 
execution environment with parameters (positional parameters or arg 
operands) that are not the same in all invocations, or with an OPTIND 
value modified to be a value other than 1, produces unspecified results.


But the test script in this thread does invoke getopts with parameters 
that are the same in all invocations, and without modifying OPTIND. I 
don't see anything else in the normative sections that would make the 
result undefined or unspecified either. I do think the script is valid, 
and the results in dash should match those of other shells.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getopts doesn't properly update OPTIND when called from function

2015-06-01 Thread Harald van Dijk

On 01/06/2015 08:29, Herbert Xu wrote:

On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote:


But the test script in this thread does invoke getopts with
parameters that are the same in all invocations, and without
modifying OPTIND. I don't see anything else in the normative
sections that would make the result undefined or unspecified either.
I do think the script is valid, and the results in dash should match
those of other shells.


The bash behaviour makes it impossible to call shell functions
that invoke getopts while in the middle of an getopts loop.



IMHO the dash behaviour makes a lot more sense since a function
always brings with it a new set of parameters.  That plus the
fact that this behaviour has been there since day one makes me
reluctant to change it since the POSIX wording is not all that
clear.


True. Given that almost no shell supports that anyway, there can't be 
too many scripts that rely on it, but I did warn about the risk of 
breaking another type of existing scripts as well, I agree that's a real 
concern.


One thing that doesn't really make sense, though: if the getopts 
internal state is local to a function, then OPTIND and OPTARG really 
should be too. Because they aren't, nested getopts loops already don't 
really work in a useful way in dash, because the inner loop overwrites 
the OPTIND and OPTARG variables. While OPTARG will typically be checked 
right at the start of the loop, before any inner loop executes, OPTIND 
is typically used at the end of the loop, in combination with the shift 
command. The current behaviour makes the OPTIND value in that case 
unreliable.


So either way, I think something should change. But if you prefer to get 
clarification first about the intended meaning of the POSIX wording, 
that certainly seems reasonable to me.



Cheers,

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash: read does not ignore trailing spaces

2015-12-04 Thread Harald van Dijk

On 03/12/2015 23:26, Harald van Dijk wrote:

On 03/12/2015 22:17, Stephane Chazelas wrote:

2015-12-03 22:02:14 +0100, Harald van Dijk:
[]

   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
-c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
   bash:
   mksh:
   posh:<a,>
   zsh:<a,>

As far as I can tell, the posh/zsh behaviour is the correct
behaviour, but I'm not convinced yet my interpretation is correct.

[...]

zsh and pdksh (and other descendants of the Forsyth shell) treat it as
separator (and are not compliant), mksh (derived from pdksh)
changed it recently. posh (also based on pdksh) still hasn't changed it.


[...]
I do see your point. Thanks for the clear example, I think I agree with
you, the description of field splitting mentions that delimiters are
used as terminators:

   "The shell shall treat each character of the IFS as a delimiter and
use the delimiters as field terminators to [...]"

It should not be much of a problem to extend the patch I posted to cover
the rules as you describe them, I will make an attempt at this later.


Here it is. Attached is an updated patch that ignores the complete 
terminator if only a single field remains, otherwise ignores only 
trailing IFS whitespace.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index b2d710d..c6e87d5 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -203,7 +203,7 @@ expandarg(union node *arg, struct arglist *arglist, int 
flag)
 * TODO - EXP_REDIR
 */
if (flag & EXP_FULL) {
-   ifsbreakup(p, );
+   ifsbreakup(p, 0, );
*exparg.lastp = NULL;
exparg.lastp = 
expandmeta(exparg.list, flag);
@@ -1016,9 +1016,11 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-zero, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
struct ifsregion *ifsp;
struct strlist *sp;
@@ -1054,12 +1056,58 @@ ifsbreakup(char *string, struct arglist *arglist)
start = p;
continue;
}
+   /* If only reading one more argument:
+* If we have exactly one field, read 
that field without its terminator.
+* If we have more than one field, read 
all fields including their terminators,
+* except for trailing IFS whitespace.
+*
+* This means that if we have only IFS 
characters left, and at most one
+* of them is non-whitespace, we stop 
reading here.
+* Otherwise, we read all the remaining 
characters except for trailing
+* IFS whitespace.
+*
+* To keep track of this, the variable 
s indicates what we've seen so far.
+* 0: only IFS whitespace.
+* 1: exactly one IFS non-whitespace 
character.
+* 2: non-IFS characters, or multiple 
IFS non-whitespace characters.
+*
+* In any case, q indicates the start 
of the characters to remove, or NULL
+* if no characters should be removed.
+*/
+   if (maxargs == 1) {
+   int s = !ifsspc;
+   q = p;
+   for (;;) {
+   p++;
+   if (p >= string + 
ifsp->endoff) {
+   if (q)
+   *q = 
'\0';
+   goto add;
+   }
+   if (*p == (char)CTLESC)
+   p++;
+   

Re: dash: read does not ignore trailing spaces

2015-12-03 Thread Harald van Dijk

On 03/12/2015 22:17, Stephane Chazelas wrote:

2015-12-03 22:02:14 +0100, Harald van Dijk:
[]

   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
-c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
   bash:
   mksh:
   posh:<a,>
   zsh:<a,>

As far as I can tell, the posh/zsh behaviour is the correct
behaviour, but I'm not convinced yet my interpretation is correct.

[...]

No, that would be the same as for:

v=a:b:
IFS=:
set -f

set -- $v

It's meant to split into "a" and "b", not "a", "b" and "". As
":" is meant to be treated as a *delimiter* or *terminator*.

That has been discussed a few times on the austin group mailing
list.

zsh and pdksh (and other descendants of the Forsyth shell) treat it as
separator (and are not compliant), mksh (derived from pdksh)
changed it recently. posh (also based on pdksh) still hasn't changed it.


zsh indeed expands this into "a", "b" and "". The same version of posh 
that gives <a,> for my test gives just "a" and "b" for yours though.


I do see your point. Thanks for the clear example, I think I agree with 
you, the description of field splitting mentions that delimiters are 
used as terminators:


  "The shell shall treat each character of the IFS as a delimiter and 
use the delimiters as field terminators to [...]"


It should not be much of a problem to extend the patch I posted to cover 
the rules as you describe them, I will make an attempt at this later.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash: read does not ignore trailing spaces

2015-12-03 Thread Harald van Dijk

On 02/12/2015 23:37, Gioele Barabucci wrote:

Hello,

I am forwarding a bug [1] reported by a Debian user: `read` does not
ignore trailing spaces. The current version of dash is affected by
this bug.

A simple test from the original reporter:

 $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
 

 $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
 

Other shells like posh and mksh behave like bash.


This is indeed a bug based on the current specification. In the past, 
the specification was unclear and the dash interpretation was a 
legitimate one, but currently it explicitly spells out that trailing IFS 
whitespace gets ignored.


However, unless I'm misreading the spec, mksh and bash don't seem to 
implement it properly either: only trailing IFS whitespace is supposed 
to be dropped. IFS non-whitespace is supposed to remain, even at the end 
of the input. mksh and bash remove it, posh and zsh leave it in:


  $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell -c 
'IFS=,; echo a, | { read v; echo "<$v>"; }'; done

  bash:
  mksh:
  posh:<a,>
  zsh:<a,>

As far as I can tell, the posh/zsh behaviour is the correct behaviour, 
but I'm not convinced yet my interpretation is correct.


Attached is a not fully tested proof of concept to implement the 
posh/zsh behaviour in dash by extending ifsbreakup() to allow specifying 
a maximum number of arguments instead of fixing it up in 
readcmd_handle_line(). It returns  in your test, and <a,> in mine. 
Feedback welcome.


Cheers,
Harald van Dijk


This error is reproducible with dash 0.5.7 and with the current master
git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.

[1] https://bugs.debian.org/794965

--
Gioele Barabucci <gio...@svario.it>
diff --git a/src/expand.c b/src/expand.c
index b2d710d..6afd562 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -203,7 +203,7 @@ expandarg(union node *arg, struct arglist *arglist, int 
flag)
 * TODO - EXP_REDIR
 */
if (flag & EXP_FULL) {
-   ifsbreakup(p, );
+   ifsbreakup(p, 0, );
*exparg.lastp = NULL;
exparg.lastp = 
expandmeta(exparg.list, flag);
@@ -1016,9 +1016,11 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-zero, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
struct ifsregion *ifsp;
struct strlist *sp;
@@ -1054,12 +1056,36 @@ ifsbreakup(char *string, struct arglist *arglist)
start = p;
continue;
}
+   /* If only reading one more argument, 
combine any field terminators,
+* except for trailing IFS whitespace. 
*/
+   if (maxargs == 1) {
+   q = p;
+   p++;
+   if (ifsspc) {
+   /* Ignore IFS 
whitespace at end */
+   for (;;) {
+   if (p >= string 
+ ifsp->endoff) {
+   *q = 
'\0';
+   goto 
add;
+   }
+   if (*p == 
(char)CTLESC)
+   p++;
+   ifsspc = 
strchr(ifs, *p) && strchr(defifs, *p);
+   p++;
+   if (!ifsspc) {
+   break;
+   }
+   }
+   }
+   continue;
+   }
*q = '\0';
sp = (struct strlist *)stalloc(sizeof 
*sp)

Re: [PATCH v3] builtin: Fix handling of trailing IFS white spaces

2016-06-12 Thread Harald van Dijk

On 07/06/16 11:25, Herbert Xu wrote:

If the very first character later gets zeroed, you'd be zeroing
the character after the CTLESC, leaving the CTLESC in the value
to be assigned to the last variable of read.


Ah, I see. Thanks for the explanation. While trying to make it misbehave 
I found that rmescapes removes trailing CTLESC, and readcmd_handle_line 
calls rmescapes before the trailing CTLESC gets a chance to cause 
problems. I was now still only able to see the problem when adding some 
extra debugging statements.



If you got rid of the very first q=p assignment it should just work.


There is a later q=p assignment too that gets performed after CTLESC has 
been skipped. That one also isn't going to cause problems in practice, 
since it only gets executed when a character is both IFS and IFS 
whitespace, but when called from readcmd_handle_line, there's never 
going to be escaped IFS whitespace.



Thanks.  I've fixed up the buglet causing the failures:


The results are a lot better now, but I did spot a problem:

src/dash -c 'X="x  "; echo $X'

This is supposed to print "x\n", but the IFS breakup of $X generates two 
words, one "x", one " ", meaning "x  \n" gets printed instead. I think 
this is intended to get fixed up in the if (ifsspc) block, but that 
block doesn't get executed when there are no more characters after the 
spaces.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] builtin: Fix handling of trailing IFS white spaces

2016-06-12 Thread Harald van Dijk

On 12/06/16 13:06, Herbert Xu wrote:

On Sun, Jun 12, 2016 at 12:35:15PM +0200, Harald van Dijk wrote:


The results are a lot better now, but I did spot a problem:

src/dash -c 'X="x  "; echo $X'

This is supposed to print "x\n", but the IFS breakup of $X generates
two words, one "x", one " ", meaning "x  \n" gets printed instead. I
think this is intended to get fixed up in the if (ifsspc) block, but
that block doesn't get executed when there are no more characters
after the spaces.


Weird, I can't reproduce this at all:

$ src/dash -c 'X="x "; echo $X' | cat -A
x$
$

What am I doing wrong?


It needs two spaces after the "x", not just one. With a single space, it 
does work right.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] builtin: Fix handling of trailing IFS white spaces

2016-06-19 Thread Harald van Dijk

On 12/06/16 14:17, Herbert Xu wrote:

Tricky :) OK here is a new version that should fix this bug.


FWIW, I've been testing this along with your eval patch, haven't spotted 
any more issues, either from code inspection or from actual use, and 
don't expect to spot any more issues. Very nice.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Illegal function names are accepted after being used as aliases

2016-02-23 Thread Harald van Dijk

On 23/02/2016 20:33, Eric Blake wrote:

exit - fuzzy.  exit is a special built-in (unlike getopts); and XCU 2.14
states:

  "Some of the special built-ins are described as conforming to XBD
Utility Syntax Guidelines. For those that are not, the requirement in
Utility Description Defaults that "--" be recognized as a first argument
to be discarded does not apply and a conforming application shall not
use that argument. "

Conforming apps cannot expect 'exit -1' to work, and therefore, cannot
also expect 'exit -- -1' to work, since the only standards-defined
values for an argument to exit is a non-negative decimal integer less
than 256.  Of course, if you want to fix it along with all the others,
that's fine; I'm just pointing out that 'exit' isn't broken as-is.


I was under the impression that the intent from the dash side was to 
handle all commands the same, and that impression was based on the fact 
that the . command has received additional code to handle -- even though 
there's no requirement for that. However, looking into the original bug 
report that prompted that change in more detail I see that the standard 
will very likely require support for -- in the . command in the future, 
so that doesn't hold up.


If that intent isn't there (I'm not saying it's not; I'm unsure now), 
the list of utilities that should be extended is far smaller, if I'm not 
overlooking anything:

- alias
- getopts
- type
- exec?
- local?

exec is like .: there's currently no requirement to support --, but that 
requirement is likely to come in the future.


local is currently non-standard and it's hard to guess whether it will 
require support for -- if standardised.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: echo do not print NUL byte

2016-01-20 Thread Harald van Dijk

Hi,

On 20/01/2016 22:38, Trek wrote:

there is no way to print the NUL byte with the echo utility

with dash (XSI, no -e argument):

   $ dash -c 'echo \\0 | od -An -c'
 \n


This is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=379227, which 
did get fixed already, but it was after 0.5.8. That should mean it will 
be in the next release of dash, but you can check out the sources from 
Git if you want to see it now.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: man fixes

2016-05-18 Thread Harald van Dijk

Hi,

On 18/05/16 11:07, Svyatoslav Mishyn wrote:

just found some typos and mdocml warnings.


iff is not a typo. It's a fairly commonly used short form for "if and 
only if". Perhaps that could be made clearer somehow.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug-diffutils] bug#24116: [platform-testers] new snapshot available: diffutils-3.3.50-0353

2016-08-05 Thread Harald van Dijk

On 05/08/2016 18:21, Eric Blake wrote:

On 08/05/2016 07:13 AM, Harald van Dijk wrote:

Unfortunately, POSIX currently requires the export command to not have
any magic quoting, and any POSIX-conforming shell will make

a="b c=d"
export a=$a

set a to b, and c to d. Not so with bash, but that's because bash simply
isn't POSIX-conforming, even if invoked as sh.


Not quite so fast.



POSIX will require special quoting rules for the export command in the
future, similar to what bash does today. When it does, dash is likely to
change to match that, and the local command will likely be changed to
work the same way.


POSIX has already issued an interpretation request, and therefore, any
CURRENT implementations (including bash) that already behave as
permitted by the interpretation ARE conforming. [...]



http://austingroupbugs.net/view.php?id=351


An interpretation request doesn't automatically mean that all current 
implementations are conforming, does it? It only means that when the 
interpretation response says so. In this case, the response says that 
the standard does not allow bash/ksh's behaviour.


http://austingroupbugs.net/view.php?id=351#c865:


Interpretation response

The standard states that the current behavior of ksh93 does not conform, and 
conforming implementations must conform to this. However, concerns have been 
raised about this which are being referred to the sponsor.


The fact that this is seen as a defect in the standard which will be 
fixed, rather than a defect in bash/ksh, doesn't make bash/ksh conform, 
it merely means that bash/ksh shouldn't be changed to conform.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why does dash save, dup, and restore redirected descriptor in the parent, rather than redirect in the child?

2017-01-31 Thread Harald van Dijk

On 31/01/2017 21:00, Mark Galeck wrote:

Why save, dup and dup again to restore, descriptors in the parent, when it 
would be much simpler to just dup in the child, and not have to save and 
restore.


This assumes there's a child in the first place. Depending on the 
command, that might not be the case, think of built-in commands such as 
echo for instance. Code is needed to handle that case, and I suspect 
that if that code is there anyway, then having one place to set up 
redirections is simpler than having two places to set up redirections.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parameter expansion, patterns and fnmatch

2016-09-03 Thread Harald van Dijk

On 02/09/16 16:51, Herbert Xu wrote:

On Fri, Sep 02, 2016 at 09:49:53AM -0500, Eric Blake wrote:

On 09/02/2016 09:29 AM, Herbert Xu wrote:

On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote:



This also affects

case [a in [?) echo ok ;; *) echo bad ;; esac

which should print ok.


Even ksh prints bad here.


So ksh is also buggy.


Good luck writing a script with an unquoted [ expecting it to be
portable :)


[ '' ] || echo empty

There, I just wrote a portable script with unquoted [ portably
interpreted as itself and not as a bracket filename expansion pattern.


dash handles that just fine.  We're not talking about a lone [
in file expansion context here.


We were originally talking about the [ in ${foo#[}. That is a lone [ in 
pattern matching context, which is what file expansion context is.


We're talking about something that dash has had code to handle for >10 
years, that's been documented by dash as supported for >10 years, and 
now that it turns out there's a flaw in the code where dash does not 
behave as documented and as clearly intended by the code, it's POSIX's 
fault?

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parameter expansion, patterns and fnmatch

2016-09-03 Thread Harald van Dijk

On 03/09/16 15:58, Herbert Xu wrote:

On Sat, Sep 03, 2016 at 03:19:57PM +0200, Harald van Dijk wrote:


But yeah, sure, if the bug has been there for over 10 years, and I'm
unable to find older versions of dash to check, I would have guessed
that dash indeed has never worked this way.


OK it looks like this actually wasn't the original behaviour.
It was introduced along with the character class support.  So
with that in mind, I feel a lot happier in changing the behaviour
of the case statement.

I've changed your patch slightly and will commit it if there are
no other issues.


None that I can see. Agreed that avoiding increment-decrement-increment 
to just do a single increment instead is an improvement, and I don't 
spot any issues in it.



---8<---
Subject: expand - Fix dangling left square brackets in patterns

When there is an unmatched left square bracket in patterns, pmatch
will behave strangely and exhibit undefined behaviour.  This patch
(based on Harld van Dijk's original) fixes this by treating it as
a literal left square bracket.

Reported-by: Olof Johansson <o...@ethup.se>
Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index 36bea76..2a50830 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1584,14 +1584,14 @@ pmatch(const char *pattern, const char *string)
p++;
}
found = 0;
-   chr = *q++;
+   chr = *q;
if (chr == '\0')
return 0;
c = *p++;
do {
if (!c) {
p = startp;
-   c = *p;
+   c = '[';
goto dft;
}
if (c == '[') {
@@ -1618,6 +1618,7 @@ pmatch(const char *pattern, const char *string)
} while ((c = *p++) != ']');
if (found == invert)
return 0;
+   q++;
break;
}
 dft:   default:


--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash 0.5.9: break and continue bug

2016-08-23 Thread Harald van Dijk

On 23/08/16 22:23, Zdenek Kaspar wrote:

Hi, I've noticed 0.5.9 does ignore break and continue statements, here
is simple reproducer:


one() {
echo "  one"
break
}

two() {
echo "  two"
}

for i in 1 2
do
echo "loop $i:"
one
two
done


dash-0.5.9:
$ dash dash-break-test
loop 1:
  one
  two
loop 2:
  one
  two

dash-0.5.8-4.fc24.x86_64:
$ dash dash-break-test
loop 1:
  one


break and continue now need to appear in the loop directly, they cannot 
be wrapped in a function any more. Although this is a visible change in 
behaviour, it is intentional:

  

The standard is a bit unclear on how break and continue are meant to 
behave in these situations, as it refers to the "enclosing loop" without 
defining what counts and doesn't count as the enclosing loop, but the 
next version of the standard will make it clear that your script is not 
required to behave the way you expect:

  

That said, shells are still allowed to treat your script the way you expect.


HTH, Z.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2016-10-10 Thread Harald van Dijk

On 08/10/16 21:42, Martijn Dekker wrote:

Op 01-10-16 om 19:17 schreef Denys Vlasenko:

ash-vars/var_unbackslash.tests


ITYM ash-vars/var_unbackslash1.tests


echo Forty two:$\
(\
(\
42\
)\
)
dash says: Syntax error: Missing '))'


Yes, but it's not clear to me that it shouldn't.

Hmm... maybe this is indeed a bug:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
"A  that is not quoted shall preserve the literal value of
the following character, with the exception of a . If a
 follows the , the shell shall interpret this as
line continuation. The  and  shall be removed before
splitting the input into tokens. Since the escaped  is removed
entirely from the input and is not replaced by any white space, it
cannot serve as a token separator."

So, unless I'm misreading this, it looks like backslashes need to be
parsed before *any* other kind of lexical analysis.


There does appear to be one exception: a comment may end with a 
backslash. This does not cause the next line to be treated as a comment: 
once a # is seen, the remaining characters on the line are not subjected 
to the regular lexical analysis, so the above does not apply.


I would have expected another exception to be in alias expansions that 
end in a backslash. Shells are not entirely in agreement there, but most 
appear to treat this the regular way, meaning


  dash -c 'alias bs=\\
  bs
  '

prints nothing.

dash has a pgetc_eatbnl function already in parser.c which skips any 
backslash-newline combinations. It's not used everywhere it could be. 
There is also some duplicated backslash-newline handling elsewhere in 
parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl() 
instead, with the exception of the one that handles comments, and 
removing the duplicated backslash-newline handling, lets this test case 
work, as well as several other similar ones, such as:


  : &\
  & :

  : \
  <\
  <\
  EO\
  F
  123
  E\
  OF

A nice benefit is that the removal of the duplicated BSNL handling 
causes a reduction in code size.


There are probably a few corner cases I'm not handling correctly in this 
patch, though. Feedback welcome.


Cheers,
Harald van Dijk
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,7 +657,7 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
+		readtoken1(pgetc_eatbnl(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
 here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
@@ -782,7 +783,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +792,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +816,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -903,7 +895,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c = pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +908,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	goto endword;	/* exit outer loop */
 USTPUTC(c, out);
 nlprompt();
-c = pgetc();
+c = pgetc_eatbnl();
 goto loop;		/* continue outer loop */
 			case CWORD:
 USTPUTC(c, out);
@@ -997,7 +989,7 @@ quotemark:
 	USTPUTC(c, out);
 	--parenlevel;
 } else {
-	if (pgetc() == ')') {
+	if (pgetc_eatbnl() == ')') {
 		USTPUTC(CTLENDARI, out);
 		if (!--arinest)
 			syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 	USTPUTC(c, out);
 }
 			}
-			c = pgetc

Re: ':' noop results in ':: not found'

2016-10-28 Thread Harald van Dijk

On 28/10/16 12:54, Tim Ruehsen wrote:

Hi,

maybe you can enlighten me :-)

I try to use dash (0.5.8-2.3) on Debian unstable for executing ./configure
scripts. Since Debian builds with --disable-lineno, all ./configure scripts
silently fall back to bash (I created bug report #842242 to get this fixed).


As you found there, this is intentional, specifically to force configure 
scripts to continue to be run with bash, which is what happened with 
older dash versions. I asked at the time why they don't just call bash 
explicitly (bash configure instead of ./configure) if that's the goal, 
but didn't get a response.



Next I rebuilt the package from the (Debian) sources without --disable-lineno
and installed it. Now I see error messages from dash when it comes to the do
nothing operator :.

$ dash -c 'test -n "a" && :'
dash: 1: :: not found
$ echo $?
127

$ bash -c 'test -n "a" && :'
$ echo $?
0

Since this : construct is wildly used and I know that dash on other systems
work with it, I wonder what is wrong here.

Is there a known bug, maybe fixed on recent dash versions ?
Or has this some simple reason my stupidity doesn't see ?


This is likely to happen if you have the LC_ALL environment variable set 
when building dash, and I can reproduce your results with 0.5.8 by 
setting LC_ALL to en_GB.UTF-8. It was fixed in 0.5.9. (Specifically, 
where a fixed sort order was needed, 0.5.8 only forced LC_COLLATE=C, but 
that still allowed LC_ALL to override it.)


dash won't be the only program that has problems with this; if you're 
building software yourself, setting LANG should generally be okay, but 
LC_ALL is best avoided. Unless you're doing it specifically to find and 
report/fix bugs, anyway.


Cheers,
Harald van Dijk


Regards, Tim

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2016-10-10 Thread Harald van Dijk

On 10-10-16 23:51, Jilles Tjoelker wrote:

On Sat, Oct 08, 2016 at 09:42:12PM +0200, Martijn Dekker wrote:

Op 01-10-16 om 19:17 schreef Denys Vlasenko:

ash-vars/var_unbackslash.tests



ITYM ash-vars/var_unbackslash1.tests



echo Forty two:$\
(\
(\
42\
)\
)
dash says: Syntax error: Missing '))'



Yes, but it's not clear to me that it shouldn't.



Hmm... maybe this is indeed a bug:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
"A  that is not quoted shall preserve the literal value of
the following character, with the exception of a . If a
 follows the , the shell shall interpret this as
line continuation. The  and  shall be removed before
splitting the input into tokens. Since the escaped  is removed
entirely from the input and is not replaced by any white space, it
cannot serve as a token separator."



So, unless I'm misreading this, it looks like backslashes need to be
parsed before *any* other kind of lexical analysis.


Yes, for  sequences that are not quoted.

For example,  contains
two characters between the quotes, not zero.


Ah, right, I missed that exception. Note that it only applies to 
single-quoted strings. In double-quoted strings, backslash-newline 
should be removed just as when unquoted.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2016-10-12 Thread Harald van Dijk

On 10/10/16 22:20, Harald van Dijk wrote:

On 08/10/16 21:42, Martijn Dekker wrote:

Op 01-10-16 om 19:17 schreef Denys Vlasenko:

ash-vars/var_unbackslash.tests


ITYM ash-vars/var_unbackslash1.tests


echo Forty two:$\
(\
(\
42\
)\
)
dash says: Syntax error: Missing '))'


Yes, but it's not clear to me that it shouldn't.

Hmm... maybe this is indeed a bug:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01

"A  that is not quoted shall preserve the literal value of
the following character, with the exception of a . If a
 follows the , the shell shall interpret this as
line continuation. The  and  shall be removed before
splitting the input into tokens. Since the escaped  is removed
entirely from the input and is not replaced by any white space, it
cannot serve as a token separator."

So, unless I'm misreading this, it looks like backslashes need to be
parsed before *any* other kind of lexical analysis.


There does appear to be one exception: a comment may end with a
backslash. This does not cause the next line to be treated as a comment:
once a # is seen, the remaining characters on the line are not subjected
to the regular lexical analysis, so the above does not apply.

I would have expected another exception to be in alias expansions that
end in a backslash. Shells are not entirely in agreement there, but most
appear to treat this the regular way, meaning

  dash -c 'alias bs=\\
  bs
  '

prints nothing.

dash has a pgetc_eatbnl function already in parser.c which skips any
backslash-newline combinations. It's not used everywhere it could be.
There is also some duplicated backslash-newline handling elsewhere in
parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl()
instead, with the exception of the one that handles comments, and
removing the duplicated backslash-newline handling, lets this test case
work, as well as several other similar ones, such as:

  : &\
  & :

  : \
  <\
  <\
  EO\
  F
  123
  E\
  OF

A nice benefit is that the removal of the duplicated BSNL handling
causes a reduction in code size.

There are probably a few corner cases I'm not handling correctly in this
patch, though. Feedback welcome.


With more extensive testing, the only issue I've seen is what Jilles 
Tjoelker had already mentioned, namely that backslash-newline should be 
preserved inside single-quoted strings, and also that it should be 
preserved inside heredocs where any part of the delimiter is quoted:


  cat <<\EOF
  \
  EOF

dash's parsing treats this mostly the same as a single-quoted string, 
and the same extra check handles both cases.


Here's an updated patch. Hoping this looks okay and can be applied.


Cheers,
Harald van Dijk
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c 

Re: echo "\\1"?

2017-07-27 Thread Harald van Dijk

On 27/07/2017 16:24, Eric Blake wrote:

On 07/27/2017 08:13 AM, Bosco wrote:

On 27 July 2017 at 12:54, Eric Blake <ebl...@redhat.com> wrote:

Which man pages?  Echo is one of those programs that varies widely, and
you are MUCH better off using printf(1) instead of echo(1) if you are
trying to get newline suppression, trying to print something that might
begin with -, or trying to print something that might contain \.


Sorry, maybe I did't explain it correctly, I mean the man pages of the
dash source:
https://git.kernel.org/pub/scm/utils/dash/dash.git/tree/src/dash.1#n1202

And because of this, I got an error compiling zziplib, you may see
https://github.com/gdraheim/zziplib/blob/v0.13.67/configure#L17542


Eww - storing generated files in git - that forces everyone that checks
out your project to use the EXACT same version of autotools to avoid
changing the generated files unintentionally.

Looking at those lines:


   if test -f $ac_prefix_conf_INP ; then
 echo "s/^#undef  *\\([ABCDEFGHIJKLMNOPQRSTUVWXYZ_]\\)/#undef 
$ac_prefix_conf_UPP""_\\1/" > conftest.prefix


ac_prefix_conf_INP is not defined anywhere in autoconf 2.69 sources (and
you really shouldn't use the ac_ prefix if you are writing code that is
not part of autoconf proper).  I couldn't find mention of it at
https://github.com/gdraheim/zziplib/blob/v0.13.67/configure.ac, but it
may be in one of your other included files.  Can you pinpoint which part
of your configure.ac results in that part of the generated configure
file?  In all likelihood, you are using a buggy macro that is using
autoconf primitives incorrectly, and thus resulting in non-portable
code.  But without seeing the true source, I can't help you debug your
problem.


This is coming from zziplib's outdated copy of ax_prefix_config_h.m4, 
for which I submitted a patch to make it work in dash, and which was 
accepted, back in 2010 (!): <https://savannah.gnu.org/patch/?7317>


zziplib should update to a current version of this file.


Arguably, since it is not required by POSIX, we don't have to do it. But
I also can't argue that POSIX forbids us to support \1 as an extension
(it says nothing about whether implementations can have additional
escape sequences).  So I'll argue that it is intentional as a dash
extension.  But if you can make dash smaller by getting rid of the
extension, that might be an acceptable patch.


In that case, I think, the man page of dash should be modified with
that extension.


Indeed, or the fact that it is NOT documented means that it is an
unintentional bug for providing the extension.


The man page of dash states:

  All other backslash sequences elicit undefined behaviour.

Implementation-wise, POSIX gives the same requirements for escape 
sequences for echo as for printf %b, therefore it makes sense to share 
the code between the two. Neither echo nor printf %b requires \1 to be 
treated as an escape sequence, but both allow it.


bash treats \1 as an escape sequence with printf %b. For compatibility 
with bash, it sort of makes sense to implement this in dash as well. Yet 
for some reason, bash *doesn't* treat \1 as an escape sequence with 
echo, not even with echo -e.


Keeping dash small, continuing to share the code between the two cases, 
can only mean behaving differently from bash in at least one of them.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Here-document redirection with vi/emacs on

2017-06-29 Thread Harald van Dijk

On 27/06/17 16:29, Zando Fardones wrote:

Hello,

I think I've found a bug when using the here-document redirection in
an interactive shell. What basically happens is that you can't see the
command output if you set the "vi" or "emacs" options.


That's not quite what happens: the here-document contents got lost, so 
there is no command output to see. Nice find.


The problem is that getprompt() is implicitly called by el_gets(). This 
messes with the memory used by the parser to store the here-document's 
contents. In the non-emacs/vi case, the prompt is explicitly written by 
setprompt(), which wraps the getprompt() call in a 
pushstackmark()/popstackmark() pair to restore the state so that parsing 
can continue. But when getprompt() is called by el_gets(), it knows 
nothing about this.


The whole call to el_gets() can be surrounded by another 
pushstackmark()/popstackmark() pair to solve the problem, as attached.


Cheers,
Harald van Dijk
--- a/src/input.c
+++ b/src/input.c
@@ -147,8 +147,12 @@ retry:
 		static const char *rl_cp;
 		static int el_len;
 
-		if (rl_cp == NULL)
+		if (rl_cp == NULL) {
+			struct stackmark smark;
+			pushstackmark(, stackblocksize());
 			rl_cp = el_gets(el, _len);
+			popstackmark();
+		}
 		if (rl_cp == NULL)
 			nr = 0;
 		else {


Re: [PATCH] [BUILTIN] describe_command: fix incorrect path

2017-05-26 Thread Harald van Dijk

Hi,

On 26/05/17 09:04, Youfu Zhang wrote:

$ PATH=/extra/path:/usr/sbin:/usr/bin:/sbin:/bin \

sh -xc 'command -V ls; command -V ls; command -Vp ls; command -vp ls'

+ command -V ls
ls is /bin/ls
+ command -V ls
ls is a tracked alias for /bin/ls
+ command -Vp ls
ls is a tracked alias for (null)
+ command -vp ls
Segmentation fault (core dumped)

describe_command should respect `path' argument. Looking up in the hash table
may gives incorrect index in entry.u.index and finally causes incorrect output
or SIGSEGV.


True, but only when a path is passed in. If the default path is used, 
looking up in the hash table is correct, and printing tracked aliases is 
intentional.


If it's desirable to drop that feature, then it should be dropped 
completely, code shouldn't be left in that can no longer be used. But 
it's possible to keep it working: how about this instead?


Signed-off-by: Harald van Dijk <har...@gigawatt.nl>
---
 src/exec.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/exec.c b/src/exec.c
index ec0eadd..1350da3 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -743,8 +743,6 @@ describe_command(out, command, path, verbose)
struct tblentry *cmdp;
const struct alias *ap;

-   path = path ?: pathval();
-
if (verbose) {
outstr(command, out);
}
@@ -767,8 +765,15 @@ describe_command(out, command, path, verbose)
goto out;
}

-   /* Then check if it is a tracked alias */
-   if ((cmdp = cmdlookup(command, 0)) != NULL) {
+	/* Then if the standard search path is used, check if it is a tracked 
alias.  */

+   if (path == NULL) {
+   path = pathval();
+   cmdp = cmdlookup(command, 0);
+   } else {
+   cmdp = NULL;
+   }
+
+   if (cmdp != NULL) {
entry.cmdtype = cmdp->cmdtype;
entry.u = cmdp->param;
} else {
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Redirection bug in subshell's EXIT trap

2017-10-19 Thread Harald van Dijk

On 19/10/17 19:44, Vitaly Sinilin wrote:

Hi,

I've recently written a quite unusual script and faced a strange dash
behavior that seems not POSIX-ly correct to me.

Here is the script with a lot of debugging extra output.

(I am sorry about posting a pretty lengthy script, but I feel like a
shorten version will look just insane without the context.)


A shortened version is easier to debug:

  dash -c 'f()(trap cat EXIT);f<&-'

This is supposed to print error messages. With dash, it doesn't.


The problem here is that although getch is run in a subshell and it
got FD#4 as stdin it somehow has parent's stdin in its EXIT trap.

POSIX says: "The environment in which the shell executes a trap on
EXIT shall be identical to the environment immediately after the
last command executed before the trap on EXIT was taken."


Yes, it is a bug. The problem is that dash, just before checking whether 
to execute the EXIT handler, has already reset everything to continue 
parsing and evaluating the next command. Which is not supposed to 
happen, never going to happen, and messes up the execution of the handler.


This resetting is done by the reset() function, and although the simple 
fix would be to move its call after exitshell(), the old ChangeLog shows 
that it was originally (about 15 years ago) the other way around and 
intentionally switched to how it is now, because some of what reset() 
does is necessary even here:



 -- Herbert Xu <herb...@debian.org>  Sat, 26 Oct 2002 21:28:33 +1000

dash (0.4.2) unstable; urgency=low

  [...]
  * Call reset() before exitshell() is called.  This fixes the bug where
returning an error from a function running under set -e caused the exit
trap to be taken with evalskip set.


Aside from the issue with redirections, the current order has other 
problems too:


  dash -c 'f()(local cmd="echo good";trap \$cmd EXIT);cmd="echo bad";f'

This is supposed to print "good". With dash, it prints "bad". With the 
call to reset() moved after exitshell(), it prints "good".


I suspect reset() needs to be split into two separate functions, but it 
may be a bit tricky to determine exactly what is supposed to go where.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


Nice catch.


The cause: uses "\\*" pattern instead of "\\\*".
The fix:

 /* backslash */
 case CBACK:
 c = pgetc2();
 if (c == PEOF) {
 USTPUTC(CTLESC, out);
 USTPUTC('\\', out);
 pungetc();
 } else if (c == '\n') {
 nlprompt();
 } else {
 if (
 dblquote &&
 c != '\\' && c != '`' &&
 c != '$' && (
 c != '"' ||
 eofmark != NULL
 )
 ) {
USTPUTC(CTLESC, out); // add this line
 USTPUTC('\\', out);
 }
 USTPUTC(CTLESC, out);
 USTPUTC(c, out);
 quotef++;
 }


I don't think this is right. The bug was introduced in


<https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c>

Prior to that, the line USTPUTC(CTLESC, out); was there. The commit 
message is saying that the logic for detecting whether \ should be taken 
literally doesn't belong in the parser, that the parser will get it 
wrong. The example in the commit message doesn't break with your patch, 
but short modifications to that example do make it fail:


Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


It causes preglob's string to potentially grow larger than the original. 
When called with RMESCAPE_ALLOC, that can be handled by increasing the 
buffer size, but preglob also gets called without RMESCAPE_ALLOC to 
modify a string in-place. That's never going to work with this approach. 
Back to the drawing board...


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-21 Thread Harald van Dijk

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-18 Thread Harald van Dijk

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


It causes preglob's string to potentially grow larger than the original. 
When called with RMESCAPE_ALLOC, that can be handled by increasing the 
buffer size, but preglob also gets called without RMESCAPE_ALLOC to 
modify a string in-place. That's never going to work with this approach. 
Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here. I'm currently testing the attached.


To be honest, FreeBSD sh's approach, keeping a syntax stack to detect 
characters' meaning reliably at parse time, feels more elegant to me 
right now, but that requires invasive and therefore risky changes to 
dash's code.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..bb16a46 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 5:52 PM, Herbert Xu wrote:

On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote:


It seems like the new control character doesn't get escaped in one place
where it should be, and gets lost because of that:

Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X


Hmm, it works here.  Can you check that your Makefile is up-to-date


It was.


and the generated syntax.c looks like this:

const char basesyntax[] = {
   CEOF,CSPCL,   CWORD,   CCTL,
   CCTL,CCTL,CCTL,CCTL,
   CCTL,CCTL,CCTL,CCTL,


Thanks, that was what was wrong. Although mksyntax did get re-executed 
because of your Makefile change, re-executing it didn't help: mksyntax 
doesn't use parser.h at run time, only at build time. So I think the 
Makefile.am change should instead be:


--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax
 signames.c: mksignames
./$^

-mksyntax: token.h
+mksyntax: parser.h token.h

 $(HELPERS): %: %.c
$(COMPILE_FOR_BUILD) -o $@ $<

Cheers,
Harald van Dijk

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 1:33 AM, Herbert Xu wrote:

On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote:

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!


Good work guys.  However, could you check to see if this patch
works too? It passes all my tests so far.


It seems like the new control character doesn't get escaped in one place 
where it should be, and gets lost because of that:


Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-19 Thread Harald van Dijk

On 2/18/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in 
_rmescapes. It passes your test case and mine, but I'll do more 
extensive testing.


It causes preglob's string to potentially grow larger than the 
original. When called with RMESCAPE_ALLOC, that can be handled by 
increasing the buffer size, but preglob also gets called without 
RMESCAPE_ALLOC to modify a string in-place. That's never going to work 
with this approach. Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here.


It required one obvious additional trivial change to the CHECKSTRSPACE 
invocation, but with that added, the attached passed all testing I could 
think of. Does this look okay to include, did I miss something, or is 
there perhaps a better alternative?


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..a847b2e 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -909,7 +909,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 #endif
 		CHECKEND();	/* set c to PEOF if at end of here document */
 		for (;;) {	/* until end of line or end of word */
-			CHECKSTRSPACE(4, out);	/* permit 4 calls to USTPUTC */
+			CHECKSTRSPACE(5, out);	/* permit 5 calls to USTPUTC */
 			switch(syntax[c]) {
 			case CNL:	/* '\n' */
 if (syntax == BASESYNTAX)
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-25 Thread Harald van Dijk

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Never mind, this is a very bad test. I forgot about echo's backslash 
handling. Had my terminal emulator given any feedback on \a, I might 
have noticed that before sending. This test doesn't show anything wrong 
in this area, but I'll check a bit more.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 09:40, Harald van Dijk wrote:

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


And hopefully I've got a good example now:

$ touch '\www'
$ src/dash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\www

In both $z and ${x#$z} / ${y#$z}, $z is unquoted. In $z, it's 
interpreted as a literal backslash followed by a * wildcard. In ${x#$z}, 
it's interpreted as an escaped * character. I don't understand why these 
cases should be treated differently.


ksh agrees with dash.

bash also behaves strangely here. In both cases, it treats the * as 
escaped by the preceding backslash, but in $z, the backslash is 
preserved, whereas in ${x#$z} / ${y#$z}, the backslash is removed:


$ bash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\*

posh is more consistent: in both $z and in ${x#$z} / ${y#$z}, $z is 
treated as a literal backslash followed by a * wildcard:


$ posh -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
*a
*a
\www

I'm having trouble telling from the spec what the expected behaviour is 
here, and because of that, whether dash's behaviour is already correct 
or needs to be modified.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2018-03-07 Thread Harald van Dijk

On 3/7/18 7:51 AM, Herbert Xu wrote:

On Wed, Mar 07, 2018 at 07:49:16AM +0100, Harald van Dijk wrote:


This was wrong in the original patch, but I'm not seeing it in the updated
patch that you replied to. When parsing a heredoc where part of delimiter is
quoted, syntax==SQSYNTAX. Since the calls to pgetc_eatbnl() are conditional
on syntax!=SQSYNTAX, there shouldn't be a problem. It would be a different
story if the delimiter could be an unquoted backslash, but thankfully that
is not possible.


Good point.  In that case please resend it with the pgetc2 change
and it should be good to go.


Here you go.

The problem with

  dash -c 'alias x=
  x'

and

  dash -c 'alias bs=\\
  bs
  '

looks like it just needs one extra line, so also attached as a separate 
patch.


Cheers,
Harald van Dijk
diff --git a/src/parser.c b/src/parser.c
index 382658e..8b945e3 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -836,7 +830,7 @@ static int pgetc_eatbnl(void)
 	int c;
 
 	while ((c = pgetc()) == '\\') {
-		if (pgetc() != '\n') {
+		if (pgetc2() != '\n') {
 			pungetc();
 			break;
 		}
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	goto endword;	/* exit outer loop */
 USTPUTC(c, out);
 nlprompt();
-c = pgetc();
+c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 goto loop;		/* continue outer loop */
 			case CWORD:
 USTPUTC(c, out);
@@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	USTPUTC(CTLESC, out);
 	USTPUTC('\\', out);
 	pungetc();
-} else if (c == '\n') {
-	nlprompt();
 } else {
 	if (
 		dblquote &&
@@ -997,7 +989,7 @@ quotemark:
 	USTPUTC(c, out);
 	--parenlevel;
 } else {
-	if (pgetc() == ')') {
+	if (pgetc_eatbnl() == ')') {
 		USTPUTC(CTLENDARI, out);
 		if (!--arinest)
 			syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 	USTPUTC(c, out);
 }
 			}
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 			heredoc = (struct heredoc *)stalloc(sizeof (struct here

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, again due
to the non-recursive nature of dash's parser. A small generalisation of what
I've been doing so far could handle this, so it makes sense to me to try to
achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


Nice approach.


It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.



As I have not reverted the run-time quote book-keeping, the handling
of $@/$* remains unchanged.


I'll try to re-do those fixes based on the approach you're going for here.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2018-03-06 Thread Harald van Dijk

On 3/6/18 9:45 AM, Herbert Xu wrote:

On Wed, Oct 12, 2016 at 07:24:26PM +0200, Harald van Dijk wrote:



I would have expected another exception to be in alias expansions that
end in a backslash. Shells are not entirely in agreement there, but most
appear to treat this the regular way, meaning

   dash -c 'alias bs=\\
   bs
   '

prints nothing.


I think your patch changes this.  In order to preserve the existing
behaviour (which seems logical), you should change the second pgetc
call in pgetc_eatbnl to pgetc2.


Oh, indeed, thanks.

There's another problem: when there is no following command (as in the 
above example), things break. A shorter reproducer that has failed for 
years is


  $ dash -c 'alias x=
  x'
  dash: 2: Syntax error: end of file unexpected

This breaks because the part where list() checks for NL/EOF, 
checkkwd==0, so aliases aren't expanded. Immediately after that, 
checkkwd is set and the next call to readtoken() would return TEOF, but 
by that point, dash has already committed to parsing a command.


Since this is actually a long-standing problem, not something introduced 
by the patch, I think it's okay to ignore for now. Do you agree?



With more extensive testing, the only issue I've seen is what Jilles
Tjoelker had already mentioned, namely that backslash-newline should be
preserved inside single-quoted strings, and also that it should be
preserved inside heredocs where any part of the delimiter is quoted:

cat <<\EOF
\
EOF

dash's parsing treats this mostly the same as a single-quoted string,
and the same extra check handles both cases.

Here's an updated patch. Hoping this looks okay and can be applied.



I'm fine with the concept.  However, your patch also breaks here-
document parsing when the delimiter is a single backslash.

cat << "\"
\

>

If you can fix these two problems it should be good to go.


As Martijn Dekker wrote, this should work when the backslash is escaped 
or single-quoted, and in my testing does. But what you have is a nice 
start of another corner case:


  cat << "\"
  \
  "EOF
  ok
  "
  EOF

I'm happily surprised to see that dash accepts and gives sensible 
treatment to multi-line heredoc delimiters.


Okay with your one extra pgetc()=>pgetc2() change, then?

Cheers,
Harald van Dijk


Cheers,

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 8:04 PM, Harald van Dijk wrote:

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( 
$(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, 
again due
to the non-recursive nature of dash's parser. A small generalisation 
of what
I've been doing so far could handle this, so it makes sense to me to 
try to

achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached. I tried to avoid recursion 
for the cases that unpatched dash already handled properly.



It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.


To handle that it can just look at dblquote instead, included in the 
attached.


In this patch, I managed let "${$+\}}" expand to }, but "${$+"\}"}" to 
\}, for the reasons I gave earlier.


Martijn Dekker's test case of 'x=0; x=$((${x}+1))' also works with this.

Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..903e250 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -1644,7 +1633,6 @@ char *
 _rmescapes(char *str, int flag)
 {
 	char *p, *q, *r;
-	unsigned inquotes;
 	int notescaped;
 	int globbing;
 
@@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
-	inquotes = 0;
 	globbing = flag & RMESCAPE_GLOB;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
-			inquotes = ~inquotes;
 			p++;
 			notescaped = globbing;
 			continue;
 		}
+		if (*p == '\\') {
+			/* naked back slash */
+			notescaped = 0;
+			goto copy;
+		}
 		if (*p == (char)CTLESC) {
 			p++;
 			if (notescaped)
 *q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/expand.h b/src/expand.h
index 26dc5b4..90f5328 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -55,7 +55,6 @@ struct arglist {
 #define	EXP_VARTILDE	0x4	/* expand tildes in an assignment */
 #define	EXP_REDIR	0x8	/* file glob for a redirection (1 match only) */
 #define EXP_CASE	0x10	/* keeps quotes around for CASE pattern */
-#define EXP_QPAT	0x20	/* pattern in quoted parameter expansion */
 #define EXP_VARTILDE2	0x40	/* expand tildes after colons only */
 #define EXP_WORD	0x80	/* expand word in parameter expansion */
 #define EXP_QUOTED	0x100	/* expand word in double quotes */
diff --git a/src/parser.c b/src/parser.c
index 382658e..17d60b0 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -827,7 +827,8 @@ xxreadtoken(void)
 		}
 	}
 breakloop:
-	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	return lasttoken;
 #undef RETURN
 }
 
@@ -855,47 +856,147 @@ static int pgetc_eatbnl(void)
  * word which marks the end of the doc

Re: dash tested against ash testsuite: 17 failures

2018-03-07 Thread Harald van Dijk

On 3/8/18 7:30 AM, Herbert Xu wrote:

Could you please resend these patches as two separate emails please?
Patchwork cannot handle two patches in one email:

https://patchwork.kernel.org/patch/10264661/


Ah, didn't realise that. I'll keep that in mind for future mails.

Actually, I'll withdraw my second patch for now, I spotted another 
problem in alias handling (already present in 0.5.9.1) and want to 
double-check that my patch didn't make things worse.



Also it would be nice if you could include the patch descriptions
in each email as these will go into the git tree.


parser: use pgetc_eatbnl() in more places.

dash has a pgetc_eatbnl function in parser.c which skips any 
backslash-newline combinations. It's not used everywhere it could be. 
There is also some duplicated backslash-newline handling elsewhere in 
parser.c. Replace most of the calls to pgetc() with calls to 
pgetc_eatbnl() and remove the duplicated backslash-newline handling.
diff --git a/src/parser.c b/src/parser.c
index 382658e..8b945e3 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -836,7 +830,7 @@ static int pgetc_eatbnl(void)
 	int c;
 
 	while ((c = pgetc()) == '\\') {
-		if (pgetc() != '\n') {
+		if (pgetc2() != '\n') {
 			pungetc();
 			break;
 		}
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	goto endword;	/* exit outer loop */
 USTPUTC(c, out);
 nlprompt();
-c = pgetc();
+c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 goto loop;		/* continue outer loop */
 			case CWORD:
 USTPUTC(c, out);
@@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	USTPUTC(CTLESC, out);
 	USTPUTC('\\', out);
 	pungetc();
-} else if (c == '\n') {
-	nlprompt();
 } else {
 	if (
 		dblquote &&
@@ -997,7 +989,7 @@ quotemark:
 	USTPUTC(c, out);
 	--parenlevel;
 } else {
-	if (pgetc() == ')') {
+	if (pgetc_eatbnl() == ')') {
 		USTPUTC(CTLENDARI, out);
 		if (!--arinest)
 			syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 	USTPUTC(c, out);
 }
 			}
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 	

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-08 Thread Harald van Dijk

On 08/03/2018 08:55, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote:


If the syntax stack is to be stored on the actual stack, then real recursion
could be used instead, as attached. I tried to avoid recursion for the cases
that unpatched dash already handled properly.


It does look a lot simpler than I expected.  However, this patch
is still 30% bigger than my patch :)


That's a bit unfair: that's mostly due to moved code, not to changed code.

But size-wise, your patch still wins: parser.o does become larger with 
my patch than with yours, largely due to my attempts to only recurse 
when needed.



As real recursion doesn't seem to buy us much I think I'll stick
with the syntax stack for now.


@@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char 
*eofmark, int striptabs)
c != '\\' && c != '`' &&
c != '$' && (
c != '"' ||
-   eofmark != NULL
+   (eofmark != NULL && 
!varnest)
+   ) && (
+   c != '}' ||
+   !varnest ||
+   (dqvarnest ? innerdq : 
dblquote)


So this seems to be the only substantial difference between your
patch and mine.  I have looked at the behaviour of other shells
and I think I will stick with my patch's behaviour for now.


The first line of this part of my patch is about something else:

  x=\"; cat <
In general, if there are disagreements between shells and the
standard is not specific enough, I'll pick the approach that
results in simpler code.


Fair enough.

Cheers,
Harald van Dijk


Thanks,


--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] parser: Fix single-quoted patterns in here-documents

2018-03-09 Thread Harald van Dijk

On 3/9/18 4:07 PM, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 07:35:53PM +0100, Harald van Dijk wrote:


Related:

   x=*; cat <

I don't think this is related to our patches at all.


Not related to our patches, but related to the original bug. It's 
another instance where quoted * is wrongly treated as unquoted.


Your fix looks good to me.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-28 Thread Harald van Dijk

On 2/28/18 11:14 PM, Denys Vlasenko wrote:

Guys, while you work on fixing this, consider taking a look at
some more parsing cases in this bbox bug:

https://bugs.busybox.net/show_bug.cgi?id=10821

I suppose some of them may fail in dash too (I did not test, sorry).


$'...' isn't supported in dash, but the underlying problem does appear 
to exist in dash too:


$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the 
pattern is considered unquoted regardless of the outer quotation marks. 
Because of that, the single quote characters should not be taken 
literally, but should be taken as quoting the y. ksh, posh and zsh agree 
with bash.


But: POSIX does not say the same for +/-/..., so dash is correct there:

$ bash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

$ dash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

Because of that, for that report's follow-up comment about

  (unset x; echo "${x-$'\x41'}")

I would not have expected this to be taken as a $'...' string. ksh 
(which does support $'...' strings too) prints the literal text $'\x41', 
and so does bash if invoked as sh.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Expansion-lookalikes in heredoc delimiters

2018-03-15 Thread Harald van Dijk

On 15/03/2018 15:52, Herbert Xu wrote:

On Thu, Mar 15, 2018 at 12:41:10PM +0100, Harald van Dijk wrote:


It is if you want to do it the way POSIX specifies. You're adding a special
exception in the parser. I don't see how this approach can be extended to
handle the other examples in my mail:


I don't think it's exactly clear what POSIX says on this.


I don't think it's clear what POSIX means to say, but I do think it's 
clear what it does say.



   cat <<`one two`
   ok
   `one two`


Try this one:

cat << ${
${

bash/ksh93 both will get stuck forever until a right brace is
entered.


That's because POSIX specifies that after ${, everything up to the 
matching }, not including nested strings, expansions, etc., is part of 
the word. No exception is made when it spans multiple lines.


Another instance of this is

  if false; then
echo ${
  fi
  }
  fi
  echo ok

This is accepted by bash, ksh, zsh and posh.

However, in this instance, I'm having trouble finding where, but IIRC, 
POSIX says or means to say that it's okay to flag this as an invalid 
parameter expansion at parse time even though the evaluation would 
otherwise be skipped.



While other shells all behave the way dash does.


All? At least two others don't: yash rejects ${ as invalid because of 
the missing parameter name. zsh agrees with bash and ksh that it should 
look for the closing }.



Considering the fact that even if you closed the brace after
the newline bash would still be stuck forever,


That's because bash doesn't ever match multi-line heredoc delimiters. 
Which is what POSIX technically requires:


  "The here-document shall be treated as a single word that begins 
after the next  and continues until there is a line containing 
only the delimiter and a , with no  characters in between."


No single line will ever match a multi-line heredoc delimiter.


I think this
behaviour is suboptimal.  ksh93 seems to do the right thing though.


Yes, I agree that multi-line heredoc delimiters are useful.


Another interesting thing to try is

cat << ${ foo
${

Also you can look at the quotes:

cat << "
"

IOW it's not clear that "word" in this context necessarily follows
the same rules used during normal tokenisation.


Quotes are clear: as far as they go, this *must* follow the same rules 
used during normal tokenisation. Does any shell disagree?


  cat << "A ' B"
  ok 1
  A ' B
  echo ok 2

I expect this to print

  ok 1
  ok 2

and I'm not finding any shell that disagrees, not even dash. Do you have 
an example of a shell that sees only "A as the heredoc delimiter, or one 
that perhaps performs quote removal on the inner '?


For your example, does any shell, including dash, *not* treat this as a 
multi-line heredoc delimiter consisting of a newline character (which 
may or may not be matched by two blank lines, depending on the shell)?



Anyway, dash has had CHKEOFMARK since 2007 so this is just an
extension of existing behaviour.


This is definitely true: your patch, regardless of whether it matches 
POSIX's requirements, is consistent with how dash has behaved for a long 
time.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Expansion-lookalikes in heredoc delimiters

2018-03-15 Thread Harald van Dijk

On 15/03/2018 18:11, Herbert Xu wrote:

On Thu, Mar 15, 2018 at 05:29:27PM +0100, Harald van Dijk wrote:


That's because POSIX specifies that after ${, everything up to the matching
}, not including nested strings, expansions, etc., is part of the word. No
exception is made when it spans multiple lines.

Another instance of this is

   if false; then
 echo ${
   fi
   }
   fi
   echo ok

This is accepted by bash, ksh, zsh and posh.


Interestingly, ksh bombs out on this:

echo ${
fi
}

So this behaviour is not exactly consistent.


It's perfectly consistent. It gets accepted at parse time, it only gets 
rejected at expansion time. That's how dash generally behaves as well:


  $ dash -c 'echo ${x^}'
  dash: 1: Bad substitution
  $ dash -c ': || echo ${x^}'
  $

Historically, as I understand it, ash would reject both of these, but 
you specifically modified dash to accept invalid expansions during 
parsing (which makes sense):



<https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3df3edd13389ae768010bfacee5612346b413e38>


In any case, this substituion is invalid in all of these shells so
does it really matter?


Okay, it can be trivially modified to something that does work in other 
shells (even if it were actually executed), but gets rejected at parse 
time by dash:


  if false; then
: ${$+
  }
  fi

It'd be nice if all shells used the same parse rules, so that scripts 
can detect dynamically which expansions are supported, but don't have to 
go through ugly eval commands to use them.



That's because bash doesn't ever match multi-line heredoc delimiters. Which
is what POSIX technically requires:

   "The here-document shall be treated as a single word that begins after the
next  and continues until there is a line containing only the
delimiter and a , with no  characters in between."

No single line will ever match a multi-line heredoc delimiter.


Sure.  But this is a quality of implementation issue.  If you're
never going to match the delimter you should probably emit an error
at the very start rather than at the EOF.


POSIX isn't clear on whether reaching EOF without seeing the heredoc 
delimiter is an error and shells disagree. dash silently accepts it, as 
do ksh, yash and zsh.


When EOF is an acceptable way to terminate a heredoc, a delimiter which 
never matches can be useful to force the complete remainder of the file 
to be treated as a heredoc.



Another interesting thing to try is

cat << ${ foo
${

Also you can look at the quotes:

cat << "
"

IOW it's not clear that "word" in this context necessarily follows
the same rules used during normal tokenisation.


Quotes are clear: as far as they go, this *must* follow the same rules used
during normal tokenisation. Does any shell disagree?


I was talking about multi-line quotes, which you conveniently dismissed :)


I got back to that :) I picked another example first because I thought 
it would be clearer using that that the normal tokenisation rules should 
be used.



For your example, does any shell, including dash, *not* treat this as a
multi-line heredoc delimiter consisting of a newline character (which may or
may not be matched by two blank lines, depending on the shell)?


Yes, almost every shell fails with the multi-line delimiter inside
double quotes.  Only dash and ksh93 seem to get it right.


posh accepts multi-line heredoc delimiters as well.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Expansion-lookalikes in heredoc delimiters

2018-03-08 Thread Harald van Dijk

Consider this:

  cat <<`bad`
  `bad`

As far as I can tell, this is technically valid, supposed to print 
nothing, and accepted in most other shells.


POSIX Token Recognition says `bad` is to be recognised as a single token 
of type word, and any word can act as a heredoc delimiter. POSIX 
Here-Document says only quote removal is performed on the word.


However, dash does not always preserve the original spelling of the 
word. That's what's going on here. Because dash hasn't preserved the 
`bad` spelling (it's been turned into CTLBACKQ), the check for the end 
of the heredoc doesn't pick it up, and it instead gets taken as a 
command substitution.


When an actual 0x84 byte is seen in the input, *that* gets taken as the 
heredoc delimiter:


  dash -c "tail -n1 <<\`:\`
  `printf \\\204`
  ok
  \`:\`"

In a locale in which 0x84 is a valid character (since dash doesn't 
support locales, that's easy, it's always valid), it's supposed to print 
"ok". dash instead interprets the second line as the end of the heredoc 
and subsequently issues an error message when it interprets "ok" on line 
3 as a command to execute.


This is pretty clearly a case that no serious script is ever going to 
encounter, not to mention one that many shells don't even attempt to 
support, at least not completely, so I don't think this is a real 
problem. I'm mentioning it anyway because I was trying to come up with a 
few more test cases for the parser, and I think it's good to have a 
record not only of what worked, what has been made to work, and what got 
broken, but also of what's never going to be work.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-09 Thread Harald van Dijk

On 3/8/18 1:40 AM, Harald van Dijk wrote:
If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached.


Even though it won't be accepted in dash, I continued with this approach 
for my own use. I've now got it to about 1800 bytes smaller (at -Os -s).


After the other changes I'd done, it became apparent to me that the 
syntax tables were unnecessary, and that they'd become fairly easy to 
get rid of. This was a big space saver that may be possible to apply to 
your version as well.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..acd5fdf 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -115,8 +115,8 @@ STATIC char *exptilde(char *, char *, int);
 STATIC void expbackq(union node *, int);
 STATIC const char *subevalvar(char *, char *, int, int, int, int, int);
 STATIC char *evalvar(char *, int);
-STATIC size_t strtodest(const char *, const char *, int);
-STATIC void memtodest(const char *, size_t, const char *, int);
+STATIC size_t strtodest(const char *, int);
+STATIC void memtodest(const char *, size_t, int);
 STATIC ssize_t varvalue(char *, int, int, int *);
 STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -396,7 +386,7 @@ done:
 	if (!home || !*home)
 		goto lose;
 	*p = c;
-	strtodest(home, SQSYNTAX, quotes);
+	strtodest(home, quotes | EXP_QUOTED);
 	return (p);
 lose:
 	*p = c;
@@ -521,7 +511,6 @@ expbackq(union node *cmd, int flag)
 	char *p;
 	char *dest;
 	int startloc;
-	char const *syntax = flag & EXP_QUOTED ? DQSYNTAX : BASESYNTAX;
 	struct stackmark smark;
 
 	INTOFF;
@@ -535,7 +524,7 @@ expbackq(union node *cmd, int flag)
 	if (i == 0)
 		goto read;
 	for (;;) {
-		memtodest(p, i, syntax, flag & QUOTES_ESC);
+		memtodest(p, i, flag & (QUOTES_ESC | EXP_QUOTED));
 read:
 		if (in.fd < 0)
 			break;
@@ -651,8 +640,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -844,7 +832,7 @@ end:
  */
 
 STATIC void
-memtodest(const char *p, size_t len, const char *syntax, int quotes) {
+memtodest(const char *p, size_t len, int quotes) {
 	char *q;
 
 	if (unlikely(!len))
@@ -855,11 +843,17 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 	do {
 		int c = (signed char)*p++;
 		if (c) {
-			if ((quotes & QUOTES_ESC) &&
-			((syntax[c] == CCTL) ||
-			 (((quotes & EXP_FULL) || syntax != BASESYNTAX) &&
-			  syntax[c] == CBACK)))
-USTPUTC(CTLESC, q);
+			if (quotes & QUOTES_ESC) {
+switch (c) {
+	case '\\':
+	case '!': case '*': case '?': case '[': case '=':
+	case '~': case ':': case '/': case '-': case ']':
+		if (quotes & EXP_QUOTED)
+	case CTLVARS:
+			USTPUTC(CTLESC, q);
+		break;
+}
+			}
 		} else if (!(quotes & QUOTES_KEEPNUL))
 			continue;
 		USTPUTC(c, q);
@@ -870,13 +864,10 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 
 
 STATIC size_t
-strtodest(p, syntax, quotes)
-	const char *p;
-	const char *syntax;
-	int quotes;
+strtodest(const char *p, int quotes)
 {
 	size_t len = strlen(p);
-	memtodest(p, len, syntax, quotes);
+	memtodest(p, len, quotes);
 	return len;
 }
 
@@ -895,15 +886,13 @@ varvalue(char *name, int varflags, int flags, int *quotedp)
 	int sep;
 	char sepc;
 	char **ap;
-	char const *syntax;
 	int quoted = *quotedp;
 	int subtype = varflags & VSTYPE;
 	int discard = subtype == VSPLUS || subtype == VSLENGTH;
-	int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
+	int quotes = quoted | (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
 	ssize_t len = 0;
 
 	sep = (flags & EXP_FULL) << CHAR_BIT;
-	syntax = quoted ? DQSYNTAX : BASESYNTAX;
 
 	switch (*name) {
 	case '$':
@@ -946,11 +935,11 @@ param:
 		if (!(ap = shellparam.p))
 			return -1;
 		while ((p = *ap++)) {
-			len += strtodest(p, syntax, quotes);
+			le

Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 14:19, Herbert Xu wrote:

Nowhere does it say that backslashes that come from the result of
parameter expansion is always literal.

So it's clear that any shell that treats the backslash as a literal
backslash is already broken.


By the literal POSIX wording, yes, but the fact remains that that's what 
all shells aside from bash were already doing, so that may mean it's a 
POSIX defect. That's why I qualified with "POSIX specifies and/or 
intends to specify".


But I'm tempted to agree with your and Martijn's point of view now that 
it's better to just implement it the way POSIX specifies, that it's more 
useful that way.



On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote:

Can you show me any shell other than bash that lets this optimisation affect
the results?


The fact is that the other shells are not doing what they do because
they're not doing this optimisation.  They do what they do because
they have broken POSIX because they always treat backslashes that
arise from parameter expansion as literal backslashes.


Right, but they're doing this optimisation as well.


FWIW it wouldn't be hard to iron out the anomalous case of /d\ev.
You just have treat the backslash as a metacharacter.


That's a good point and wouldn't have too much of an impact on 
performance of existing scripts. BTW, that means both expandmeta()'s 
metachars variable, and expmeta()'s


  if (metaflag == 0 || lstat64(expdir, ) >= 0)

optimisation. You already got rid of the latter in your patch to prevent 
buffer overflows.


In regular /d\ev that doesn't come from a variable, the backslash will 
have been replaced by CTLESC, but it's not necessary to treat CTLESC as 
a metacharacter: in that case, either there's a match and pathname 
expansion produces /dev, or there's no match and quote removal produces 
/dev, so the optimisation is still valid. It'd only affect backslashes 
that come from a variable, so it's very limited in scope.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 11:44, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 11:16:29AM +0200, Harald van Dijk wrote:


If you say that quote removal takes place on the original token (meaning
before parameter expansion), and if parameter expansion takes place before
pathname expansion, then there's nothing left to allow \* to behave
differently from *.


Either you misunderstood me or you misread POSIX.


I misunderstood you. Given v='\' and the word \\$v, I took the result of 
parameter expansion as \\\ where the first two backslashes come from the 
original word, you take the result of parameter expansion as only \. 
Because of that, when you wrote quote removal is never applied to the 
results of parameter expansion, I didn't see what you meant. I think 
you're right in your terminology, the result of parameter expansion is 
just \, to get \\\ it would need to be phrased as something like "the 
result of applying parameter expansion". Thanks for the clarification.



POSIX never actually says this optimisation is allowed. The only thing that
allows it is the fact that it produces the same results anyway. If that
stops, then checking the file system for matches becomes required.


It doesn't disallow it either.


If POSIX specifies a result, and a shell applies an optimisation that 
causes a different result to be produced, doesn't that inherently 
disallow that optimisation? There may be some confusion and/or 
disagreement over what exactly POSIX specifies and/or intends to 
specify, but I don't see how you can argue that POSIX specifies result 
A, but it's okay to apply an optimisation that produces result B.



Can you show me a shell that doesn't apply this optimisation?


Can you show me any shell other than bash that lets this optimisation 
affect the results?


Like I wrote in my initial message, all other shells I tested take a 
backslash that comes from a variable after parameter expansion as 
effectively quoted. Given your b="/*/\null", bash is the only shell I 
know that expands $b to /dev/null. All others do perform pathname 
expansion, but check to see if a file \null exists in any top-level 
directory.


Given a="/de[v]/\null" and b="/dev/\null", all shells other than bash, 
even dash, agree that $a and $b expand to '/dev/\null' if that file 
exists, or the original string if that file doesn't exist. *That* is 
what allows the optimisation in the case of $b: expanding to either 
'/dev/\null' or '/dev/\null', depending on some condition, can be done 
without evaluating that condition.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 08:18, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 08:38:01PM +0200, Harald van Dijk wrote:



My inclination is to just drop the /d\ev issue and use the bash
behaviour until such a time that bash changes or POSIX changes.


What would need to change in POSIX to convince you to change dash to match
what you were already arguing is the POSIX-mandated behaviour? :)


No the passage I quoted only mandates that backslashes be
interpreted when doing the matching.  It doesn't say anything
about whether it should be removed after matching is done.  The
only thing it does say is that:

If the pattern does not match any existing filenames or pathnames,
the pattern string shall be left unchanged.

IOW it's silent in the case where the pattern does match.


Of course it doesn't say whether characters in the pattern are preserved 
in the case of a match. That's because when there's a match, pathname 
expansion produces the file name, not the original pattern.


In a directory containing foo.txt, you wouldn't argue that *.txt might 
expand to *foo.txt just because POSIX doesn't explicitly say the * is 
removed, would you?



However, the other reason this is not worth fixing is because all
those other shells that always treat the backslash as a literal
won't remove it in this case anyway.


I seriously cannot understand the logic of pushing a break from 
traditional ash behaviour and from all shells apart from bash, giving 
POSIX as a reason for doing it, and then giving the behaviour of all 
those other shells as a reason for not doing it properly.


If all those other shells are a reason against doing it, then why isn't 
that a reason for just restoring the dash 0.5.4 behaviour that all those 
other shells implement, always taking expanded backslash as effectively 
quoted?



  So nor sensible script could
possibly use such a feature even if we implemented it.


No portable script could use such a feature. A sensible script certainly 
could. There are a lot of scripts that aren't meant to be portable 
across shells. I know I've written scripts for my own use that use dash 
features that don't work the same in bash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 09:31, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 08:52:57AM +0200, Harald van Dijk wrote:


I seriously cannot understand the logic of pushing a break from traditional
ash behaviour and from all shells apart from bash, giving POSIX as a reason
for doing it, and then giving the behaviour of all those other shells as a
reason for not doing it properly.


It's logical for dash because this aligns the behaviour of pathname
expansion with case pattern matching.


Except it doesn't actually do that. It does in some cases, and not in 
others. You've made it clear that you don't care about the cases where 
it doesn't. That can be a valid decision, but it's one you should 
acknowledge.


Also, it's just as logical to restore the case pattern matching to its 
traditional behaviour to align it with pathname expansion.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 08:23, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:19:17AM +0200, Harald van Dijk wrote:


This introduces a buffer overread. When expmeta() sees a backslash, it
assumes it can just skip the next character, assuming the next character is
not a forward slash. By treating expanded backslashes as unquoted, it
becomes possible for the next character to be the terminating '\0'.


This code has always had to deal with naked backslashes.  Can you
show me the exact pattern that results in the overread?


No, it hasn't, because expmeta() is not used in case patterns, and case 
patterns are currently the only case where naked backslashes can appear. 
In contexts where pathname expansion is performed, a backslash coming 
from a variable will be escaped by another backslash in currently 
released dash versions.


Test case:

  $v='*\'
  set -- $v

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 3/27/18 10:55 AM, Herbert Xu wrote:

So going back to dash yes I think we should adopt the bash behaviour
for pathname expansion and keep the existing case semantics.

This patch does exactly that.  Note that this patch does not work
unless you have already applied

https://patchwork.kernel.org/patch/10306507/

because otherwise the optimisation mentioned above does not get
detected correctly and we will end up doing quote removal twice.


This introduces a buffer overread. When expmeta() sees a backslash, it 
assumes it can just skip the next character, assuming the next character 
is not a forward slash. By treating expanded backslashes as unquoted, it 
becomes possible for the next character to be the terminating '\0'.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 11:52, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 08:44:28AM +0200, Harald van Dijk wrote:


Test case:

   $v='*\'
   set -- $v


I don't see how this would cause an overrun, can you please explain
it for me?


Line numbers are from 0.5.9.1.

When expanded backslashes are no longer treated as quoted, this would 
call expmeta() with the pattern *\, that is with a single unquoted 
trailing backslash, so:


expand.c:1333

if (*p == '\\')
esc++;
if (p[esc] == '/') {

The first if statement will be hit and set esc to 1. p[esc] is then 
'\0', so the second if block doesn't get entered and the outer loop 
continues:


expand.c:1315

for (p = name; esc = 0, *p; p += esc + 1) {

p += esc + 1 will increase p by 2, letting it point just past the 
terminating '\0'. The loop condition of *p now accesses the byte just 
past the pattern.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 12:37, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:03:24PM +0200, Harald van Dijk wrote:


When expanded backslashes are no longer treated as quoted, this would call
expmeta() with the pattern *\, that is with a single unquoted trailing
backslash, so:
[...]


Thanks for the explanation.  Here is an updated patch.


That seems consistent with what pmatch() does for trailing unquoted 
backslashes, but actually, I think pmatch() is wrong on that too. POSIX 
says:


  If a pattern ends with an unescaped , it is unspecified 
whether the pattern does not match anything or the pattern is treated as 
invalid.


I don't think this allows dash's behaviour of taking the backslash as a 
literal, since that still allows a match to succeed. bash lets such a 
pattern never match. In other shells, there is no way to get into this 
situation, but GNU find behaves the same as bash.


Test case:

  v=\\; case \\ in $v) echo bug;; esac

bash prints nothing, dash 0.5.9.1 prints bug. Other shells don't count 
since they interpret $v differently.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 17:14, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 02:57:12PM +0200, Harald van Dijk wrote:


That's a good point and wouldn't have too much of an impact on performance
of existing scripts. BTW, that means both expandmeta()'s metachars variable,
and expmeta()'s

   if (metaflag == 0 || lstat64(expdir, ) >= 0)

optimisation. You already got rid of the latter in your patch to prevent
buffer overflows.


No the purpose of this metaflag check is to bypass the lstat call.
My patch simply made it bypass the call by returning earlier.


Oh, right.


It is not relevant to whether we include backslash as a metacharacter.


I was thinking about not making backslashes set metaflag in expmeta(): 
when the pathname component doesn't include *, ?, or [, but does include 
backslashes, then the if (metaflag == 0) block could handle that as long 
as it performs the lstat64() check unconditionally. There's no need to 
go through the opendir()/readdir()/closedir() path for that case. Since 
expmeta() is bypassed for words not containing any potentially-magic 
characters, the impact might be small enough.


Regardless of whether metaflag is set, it would mean things like 'set 
"["' would start hitting the FS unnecessarily, if I understand 
correctly: the preglob()-expanded pattern is '\[', and expmeta() can no 
longer tell this apart from $v where v='\[' where a FS check would be 
required.


Perhaps preglob() should just be avoided, and expmeta() taught to 
respect both '\\' and CTLESC. '\\' would be a metacharacter and require 
a FS hit, CTLESC wouldn't require it. This would also avoid some memory 
allocations.



In regular /d\ev that doesn't come from a variable, the backslash will have
been replaced by CTLESC, but it's not necessary to treat CTLESC as a
metacharacter: in that case, either there's a match and pathname expansion
produces /dev, or there's no match and quote removal produces /dev, so the
optimisation is still valid. It'd only affect backslashes that come from a
variable, so it's very limited in scope.


For the case where the backslash came from the parser, the CTLESC
would have already been removed by preglob so there should be no
difference.


if (!strpbrk(str->text, metachars))
goto nometa;

in expandmeta() is done before preglob() is called. Here, it is still 
not necessary to include CTLESC.



The only problem with the metacharacter approach is that glibc's
glob(3) probably won't treat it as a metacharacter and therefore
it will still behave in the same way as the current bash.


This doesn't matter much yet, but yeah, it will if you decide to enable 
--enable-glob by default in the future. As long as you don't include 
GLOB_NOESCAPE, then normally, it should be taken as escaping the next 
character, but it might still trigger GLOB_NOMAGIC to return early. If 
it does, I'd take that as a glibc bug, but that doesn't help dash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 18:04, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 05:54:53PM +0200, Harald van Dijk wrote:


I was thinking about not making backslashes set metaflag in expmeta(): when
the pathname component doesn't include *, ?, or [, but does include
backslashes, then the if (metaflag == 0) block could handle that as long as
it performs the lstat64() check unconditionally. There's no need to go
through the opendir()/readdir()/closedir() path for that case. Since
expmeta() is bypassed for words not containing any potentially-magic
characters, the impact might be small enough.


Honestly such backslashes should be rare enough that I wouldn't
bother with such an optimisation.


Backslashes coming from parameters, sure, but backslashes introduced by 
preglob(), I'm not so sure.



Regardless of whether metaflag is set, it would mean things like 'set "["'
would start hitting the FS unnecessarily, if I understand correctly: the
preglob()-expanded pattern is '\[', and expmeta() can no longer tell this
apart from $v where v='\[' where a FS check would be required.


No it shouldn't.  We only mark [ is meta if there is a matching ].


The [ character would mark the whole string possibly-meta in 
expandmeta(), and the CTLESC wouldn't be seen there. Then, preglob() 
turns it into \[, and expmeta() wouldn't take [ as a meta character, but 
would take \ as a meta character.



Perhaps preglob() should just be avoided, and expmeta() taught to respect
both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit,
CTLESC wouldn't require it. This would also avoid some memory allocations.


We need preglob for glob(3).  I want to minimise the amount of code
difference between the glob path and the expmeta path.


Then, how about moving some of expmeta()'s checks to return early 
outside of it, so that they can also be done in the glob() case?



 if (!strpbrk(str->text, metachars))
 goto nometa;

in expandmeta() is done before preglob() is called. Here, it is still not
necessary to include CTLESC.


We could move it after preglob.


Assuming backslash is added to metachars, which was needed anyway:

Regardless of whether the check is done before or after preglob(), it 
would never skip expmeta() when it shouldn't. If it's kept before 
preglob(), then it will correctly skip expmeta() in more cases than if 
it's moved after it, so I don't think there's any benefit in moving it.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 13:00, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:53:31PM +0200, Harald van Dijk wrote:


[un-snip]

  If a pattern ends with an unescaped , it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. 



I don't think this allows dash's behaviour of taking the backslash as a
literal, since that still allows a match to succeed. bash lets such a
pattern never match. In other shells, there is no way to get into this
situation, but GNU find behaves the same as bash.


Nope, bash treats it as if the backslash is not there.

$ pwd
/home/dash
$ touch asdf\\
$ touch asdf
$ bash -c 'v="../da*sh/asdf\\"; printf "%s\n" $v'
../dash/asdf
$


Huh. Indeed in that case, but at the same time:

  bash -c 'v="?sdf\\"; printf "%s\n" $v'

Here, it will not be considered to match, this just prints ?sdf\. I 
suspect bash has the same kind of logic as dash, where it just tries to 
lstat() the file if the final pathname component contains no 
metacharacters or goes through an opendir()/readdir()/closedir() 
sequence if it does, and mishandles this corner case.


[...]

Since bash itself is inconsistent, and POSIX unclear,
What exactly is unclear about the sentence of POSIX that I quoted? Is 
there a legitimate interpretation of "It is unspecified whether A or B" 
that allows other behaviour than A or B?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 19:02, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 06:48:45PM +0200, Harald van Dijk wrote:


Backslashes coming from parameters, sure, but backslashes introduced by
preglob(), I'm not so sure.


Right.  I guess we could change it so that CTLESC is preserved
to distinguish between the backslashes from parameter expansion
vs. backslashes from open input.


Thinking about it some more, see below.


The [ character would mark the whole string possibly-meta in expandmeta(),
and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and
expmeta() wouldn't take [ as a meta character, but would take \ as a meta
character.


Oh I though you were talking about test/[, I didn't see the set.
But why would someone do set "["?


It was just the most basic command I could think of that shouldn't hit 
the FS, currently doesn't hit the FS, and would start hitting the FS if 
backslash gets taken as a metacharacter. Anything containing a quoted 
metacharacter would do. I suppose I could have used echo "[  ok  ]" 
instead for something that's more likely to pop up in a real script.



Then, how about moving some of expmeta()'s checks to return early outside of
it, so that they can also be done in the glob() case?


We could.


expandmeta() is implemented twice, once for the glob() case, once for 
the expmeta() case. There is not much code shared between them except 
for preglob(). preglob(), through _rmescapes(), already has to go over 
the whole string and check each character. If that can be made to return 
an indication somehow of whether it has seen metacharacters, that could 
be used for both cases. Perhaps an additional flag that, when set, lets 
it return NULL (and freeing the allocated string) if no metacharacters 
were seen? That means expmeta() doesn't need to learn about CTLESC, 
means there's no need to go over the string a second time in the glob() 
case to really turn CTLESC into \, and means GLOB_NOMAGIC becomes 
unnecessary and can be removed as a workaround for glibc bugs.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 20:24, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 08:01:10PM +0200, Harald van Dijk wrote:



Right.  I guess we could change it so that CTLESC is preserved
to distinguish between the backslashes from parameter expansion
vs. backslashes from open input.


Thinking about it some more, see below.


Actually let's not go down this route because this would mean
that the glob(3) path will never have the same functionality
since it cannot possibly see CTLESC.


Yes, that's why I ended up proposing what I put below, which does allow 
them the same functionality. Neither glob() nor expmeta() would see 
CTLESC, and neither would need to, since neither would need to optimise 
for the no-metachars case.



It was just the most basic command I could think of that shouldn't hit the
FS, currently doesn't hit the FS, and would start hitting the FS if
backslash gets taken as a metacharacter. Anything containing a quoted
metacharacter would do. I suppose I could have used echo "[  ok  ]" instead
for something that's more likely to pop up in a real script.


My inclination is to just drop the /d\ev issue and use the bash
behaviour until such a time that bash changes or POSIX changes.


What would need to change in POSIX to convince you to change dash to 
match what you were already arguing is the POSIX-mandated behaviour? :)


More serious response: I do think it's worth raising this as a bash bug 
too and seeing what they do.



It's such an unlikely case as why would anyone knowingly put
backslash into a variable and expecting pathname expansion to
remove it without actually using real magic characters.


That's true. The least unlikely scenario I can think of is a directory 
name containing [ but not ]. A script may end up escaping the [ in that 
case just as a precaution, even though it's not strictly needed, and 
glob() may return early due to GLOB_NOMAGIC picking up on the fact that 
[ is not magic in that context.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix buffer overflow in expandmeta

2018-03-25 Thread Harald van Dijk

On 3/26/18 4:54 AM, Herbert Xu wrote:

On Mon, Mar 26, 2018 at 03:03:38AM +0200, Harald van Dijk wrote:


This was already the case before your patch, but on systems that flat out
reject paths longer than PATH_MAX (that includes GNU/Linux), it seems weird
that expansion can produce paths which then don't work.


PATH_MAX only applies in certain cases.  Besides, you can always
cd into the directories one level at a time to circumvent it.


In other words, when you change the path to a different one than what 
globbing produced. That was kind of my point. :)



Besides, even if we did impose a limit here
we'd still have to check that limit at every recursion level which
means that the code won't be that much simpler.


Proof of concept attached. The resulting code is about 100 bytes smaller 
at -Os than the dynamic reallocation approach. It's possible to further 
reduce that with a statically allocated buffer, but of course that means 
a constant memory cost at run time.


Cheers,
Harald van Dijk
--- a/src/expand.c
+++ b/src/expand.c
@@ -122,7 +122,7 @@ STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
 STATIC void addglob(const glob_t *);
 #else
-STATIC void expmeta(char *, char *);
+STATIC void expmeta(char *);
 STATIC struct strlist *expsort(struct strlist *);
 STATIC struct strlist *msort(struct strlist *, int);
 #endif
@@ -1237,9 +1237,6 @@ addglob(pglob)
 
 
 #else	/* HAVE_GLOB */
-STATIC char *expdir;
-
-
 STATIC void
 expandmeta(struct strlist *str, int flag)
 {
@@ -1261,13 +1258,7 @@ expandmeta(struct strlist *str, int flag)
 
 		INTOFF;
 		p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP);
-		{
-			int i = strlen(str->text);
-			expdir = ckmalloc(i < 2048 ? 2048 : i); /* XXX */
-		}
-
-		expmeta(expdir, p);
-		ckfree(expdir);
+		expmeta(p);
 		if (p != str->text)
 			ckfree(p);
 		INTON;
@@ -1296,7 +1287,7 @@ nometa:
  */
 
 STATIC void
-expmeta(char *enddir, char *name)
+expmeta1(char *expdir, char *enddir, char *name)
 {
 	char *p;
 	const char *cp;
@@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name)
 			metaflag++;
 		p = name;
 		do {
+			if (enddir == expdir + PATH_MAX)
+return;
 			if (*p == '\\')
 p++;
 			*enddir++ = *p;
@@ -1390,22 +1383,34 @@ expmeta(char *enddir, char *name)
 		if (dp->d_name[0] == '.' && ! matchdot)
 			continue;
 		if (pmatch(start, dp->d_name)) {
+			p = enddir;
+			cp = dp->d_name;
+			for (;;) {
+if (p == expdir + PATH_MAX)
+	goto toolong;
+if ((*p++ = *cp++) == '\0')
+	break;
+			}
 			if (atend) {
-scopy(dp->d_name, enddir);
 addfname(expdir);
 			} else {
-for (p = enddir, cp = dp->d_name;
- (*p++ = *cp++) != '\0';)
-	continue;
 p[-1] = '/';
-expmeta(p, endname);
+expmeta1(expdir, p, endname);
 			}
 		}
+toolong: ;
 	}
 	closedir(dirp);
 	if (! atend)
 		endname[-esc - 1] = esc ? '\\' : '/';
 }
+
+STATIC void
+expmeta(char *name)
+{
+	char expdir[PATH_MAX];
+	expmeta1(expdir, expdir, name);
+}
 #endif	/* HAVE_GLOB */
 
 


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Harald van Dijk

On 26/03/2018 11:34, Martijn Dekker wrote:

Op 25-03-18 om 22:56 schreef Harald van Dijk:

   case /dev in $pat) echo why ;; esac

Now, bash and dash say that the pattern does match -- they take the
backslash as unquoted, allowing it to escape the v. Most other shells
(bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
quoted.

This doesn't make sense to me, and doesn't match historic practice:

[...]


With the snipping it's not clear that I was specifically confused by the 
inconsistency.


I had included another example:

  pat="/de\v"
  printf "%s\n" $pat

I can understand treating backslash as quoted, or treating it as 
unquoted, but not quoted-unless-in-a-case-statement. What justifies this 
one exception?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix buffer overflow in expandmeta

2018-03-26 Thread Harald van Dijk

On 3/26/18 7:12 AM, Herbert Xu wrote:

FWIW your patch when built with -O2 with gcc 4.7.2 is actually
16 bytes bigger than my version.


Interesting. Not sure what might cause that.


@@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name)
metaflag++;
p = name;
do {
+   if (enddir == expdir + PATH_MAX)
+   return;
if (*p == '\\')
p++;
*enddir++ = *p;


Also there is another loop in between these two hunks that also
needs to check enddir.


Indeed, thanks.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 3/29/18 6:42 AM, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 03:06:32PM +0200, Harald van Dijk wrote:



Since bash itself is inconsistent, and POSIX unclear,

What exactly is unclear about the sentence of POSIX that I quoted? Is there
a legitimate interpretation of "It is unspecified whether A or B" that
allows other behaviour than A or B?


The tricky part is coming up with a way to actually produce an
unescaped backslash.


You're right that what other shells implement doesn't allow any 
possibility of unescaped backslashes in the shell. But if what they do 
is what's intended, the sentence still makes perfect sense to me: it 
simply wouldn't affect the shell. Remember that the description of 
pattern matching applies to both the shell and the non-shell cases. For 
the non-shell cases, it's trivial, and it's possible that that was all 
it was intended to address:


  find . -name 'a\'

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-24 Thread Harald van Dijk

On 3/22/18 8:12 PM, Harald van Dijk wrote:
I have a local patch set of stuff that I haven't considered in an 
appropriate state to send to the list. One of the features I've got 
working is multibyte locale support. It doesn't assume UTF-8 (only 
assumes stateless encodings) and is tolerant of malformed strings. I've 
got it working in pattern matching, in glob() sorting, in trimming (#, 
##, % and %%), in field splitting, and in "$*" joining. I'll see if I 
can polish it a bit and make it available when I've got some more free 
time again, but I'm not sure how soon that will be.


Actually, there's no real harm in just publishing what I have now.

  https://github.com/hvdijk/dash

This is my personal playground, so I'll have to include some disclaimers:

Commits may take a different approach than what dash is interested in.
Commits may be poorly tested.
Commits may contain bugs.
Commits may have non-obvious dependencies on earlier commits.
Commits may be poor style.
Commits may be re-written and force-pushed.

That said, feel free to see if the locale support does what you're 
after. If so, it should be possible to bring into a shape suitable for dash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix glibc glob(3) support

2018-03-25 Thread Harald van Dijk

On 3/25/18 10:06 AM, Herbert Xu wrote:

It's been a while since we disabled glob(3) support by default.
It appears to be working now, however, we have to change our
code to remove a workaround that now actually prevents it from
working.

In particular, we no longer use GLOB_NOMAGIC or test GLOB_MAGCHAR.


I thought that was an optimisation to avoid unnecessary file system 
access, to ensure that simple commands such as "echo ok" don't bother 
looking for files named "echo" and "ok" in the current directory.


It indeed doesn't work, but I took that to be a glibc bug.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix buffer overflow in expandmeta

2018-03-25 Thread Harald van Dijk

On 3/25/18 10:38 AM, Herbert Xu wrote:

The native version of expandmeta allocates a buffer that may be
overrun for two reasons.  First of all the size is 1 byte too small
but this is normally hidden because the minimum size is rounded
up to 2048 bytes.  Secondly, if the directory level is deep enough,
any buffer can be overrun.

This patch fixes both problems by calling realloc when necessary.


This was already the case before your patch, but on systems that flat 
out reject paths longer than PATH_MAX (that includes GNU/Linux), it 
seems weird that expansion can produce paths which then don't work.


Test case:

  cd /tmp
  x=x
  x=$x$x$x$x$x$x$x$x$x$x$x$x$x$x$x
  ln -s . $x
  set -- $x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x*
  case $1 in
  *"*") echo "no match" ;;
  *)if test -e "$1"
then echo "match and exists"
else echo "match but does not exist"
fi ;;
  esac
  rm $x

I'm seeing "no match" from bash/mksh/pdksh/posh/yash/zsh, but "match but 
does not exist" from dash/bosh/ksh93. Other commands would naturally 
print the "File name too long" error.


Should the buffer perhaps simply be limited to PATH_MAX, taking care to 
just give up if something longer is found? That could avoid the need to 
handle reallocations and result in somewhat smaller and simpler code.


The downside would be that if other systems do support paths longer than 
PATH_MAX, globbing would no longer work for those paths.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 10:16, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 09:38:04AM +0200, Harald van Dijk wrote:


Also, it's just as logical to restore the case pattern matching to its
traditional behaviour to align it with pathname expansion.


No I think that makes no sense and clearly contradicts POSIX.


That's not clear at all. As I already wrote earlier and Jilles Tjoelker 
chimed in with as well, if there isn't a single shell out there that 
actually implements what POSIX specifies (bash gets close, but isn't 
fully there either), not even the shells on which the POSIX 
specification was based, there's a fair chance it's a POSIX defect, 
regardless of whether it makes sense.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 3/27/18 11:32 PM, Jilles Tjoelker wrote:

I don't think it is clear at all. Note the final paragraph of 2.13.1:

] When pattern matching is used where shell quote removal is not
] performed [...]

This implies that special characters cannot be escaped using a backslash
in a context where shell quote removal is performed.


Taken literally, it just says something about what happens when shell 
quote removal is not performed. In the cases we're talking about, shell 
quote removal is performed, so this simply wouldn't apply. Perhaps 
that's taking it more literally than intended, I don't know.



 Instead, special
characters can be escaped using shell quoting. As a result, the simplest
form of parameter expansion has either all or none of the generated
characters quoted (depending on whether the expansion is in
double-quotes or not).

There is also a sentence "The shell special characters always require
quoting." which seems untrue in practice if the characters come from an
expansion. Something like

   touch 'a'
   sh -c 'w=*\&*; printf "%s\n" $w'

works for many shells as sh. However, this could be explained away as
undefined behaviour.


This is what allows extensions to glob syntax, if those extensions use 
shell special characters.


  p="*(ab)"; case abab in $p) echo match ;; esac

This prints "match" in ksh93.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-22 Thread Harald van Dijk

On 22/03/2018 16:36, Martijn Dekker wrote:

Busybox ash [...] The licenses are bidirectionally compatible.


Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any 
busybox code is imported into dash, that from then on dash can only be 
distributed under GPL?


I have a local patch set of stuff that I haven't considered in an 
appropriate state to send to the list. One of the features I've got 
working is multibyte locale support. It doesn't assume UTF-8 (only 
assumes stateless encodings) and is tolerant of malformed strings. I've 
got it working in pattern matching, in glob() sorting, in trimming (#, 
##, % and %%), in field splitting, and in "$*" joining. I'll see if I 
can polish it a bit and make it available when I've got some more free 
time again, but I'm not sure how soon that will be.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Harald van Dijk

On 22/03/2018 03:40, Martijn Dekker wrote:

evalvar() records empty expansion results (varlen == 0) as string
regions that need to be scanned for IFS characters. This is pointless,
because there is nothing to split.


varlen may be set to -1 when an unset variable is expanded. If it's 
beneficial to avoid recording empty regions to scan for IFS characters, 
should those also be excluded?



This patch fixes the bug that, given no positional parameters, unquoted
$@ and $* incorrectly generate one empty field (they should generate no
fields). Apparently that was a side effect of the above.


This seems weird though. If you want to remove the recording of empty 
regions because they are pointless, then how does removing them fix a 
bug? Doesn't this show that empty regions do have an effect? Perhaps 
they're not supposed to have any effect, perhaps it's a specific 
combination of empty regions and something else that triggers some bug, 
and perhaps that combination can no longer occur with your patch.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-22 Thread Harald van Dijk

On 22/03/2018 22:01, Martijn Dekker wrote:

Op 22-03-18 om 20:12 schreef Harald van Dijk:

Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any
busybox code is imported into dash, that from then on dash can only be
distributed under GPL?


I thought dash was already effectively under the GPL due to including
the output of mksignames.c.


To be honest, I'm getting confused now.

I knew that the output of GPL programs is not generally covered by the 
GPL itself. And that's all that COPYING says:


 "mksignames.c:

  This file is not directly linked with dash.  However, its output is."

But about the not "generally" covered: as mentioned on 
<https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#WhatCaseIsOutputGPL>, 
if the output includes part of the GPL-licensed code, then the output is 
GPL-licensed as well. And admittedly, signames.c does contain part of 
mksignames.c.


It seems unclear to packagers, at any rate:

Fedora and Gentoo say dash is BSD-licensed. MacPorts lists it as GPL-2+.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 05:37, Herbert Xu wrote:

You're right.  The proper fix to this is to ensure that nulonly
is not set in varvalue for $*.  It should only be set for $@ when
it's inside double quotes.

In fact there is another bug while we're playing with $@/$*.
When IFS is set to a non-whitespace character such as :, $*
outside quotes won't remove empty fields as it should.


That's not limited to empty fields:

  IFS=,
  set -- , ,
  set -- $@
  echo $#

This should print 2, not 3. (bash gets this wrong too.)


This patch fixes both problems.


Also the above. But it breaks a traditional ash extension:

  unset IFS
  set -- A1 B2 C3
  echo ${@%2 C3}

This used to print A1 B, but prints A1 B2 C3 with your patch.

  echo ${@%2}

This used to print A1 B2 C3, but prints A1 B with your patch.

This extension was already pretty much broken within quoted expansions: 
"${@%2}" would already expand to A1 B. Your patch extends that to the 
non-quoted case.


I do not know if you want to keep this extension in.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 10:10, Herbert Xu wrote:

On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:


Also the above. But it breaks a traditional ash extension:

   unset IFS
   set -- A1 B2 C3
   echo ${@%2 C3}

This used to print A1 B, but prints A1 B2 C3 with your patch.

   echo ${@%2}

This used to print A1 B2 C3, but prints A1 B with your patch.


Hmm, it still does on my machine:


Apologies, I ended up sending the wrong test case. It's when IFS has its 
default value that the behaviour is changed. It's when it's unset that 
it still works.


Initial testing was against 0.5.9.1 plus your patch, now retested 
against c166b71 plus your patch.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >