Re: getopts appears to not be shifting $@ when consuming options

2021-01-29 Thread Jilles Tjoelker
On Fri, Jan 29, 2021 at 06:25:25PM +, earnestly wrote:
> In this example dash will repeatedly append 'attr=foo' to the list of
> parameters in an infinite loop:

> #!/bin/dash -x

> while getopts :a: arg -a foo -a bar; do
> case $arg in
> a) set -- "$@" attr="$OPTARG"
> esac
> done
> shift "$((OPTIND - 1))"

> Instead I expected this to result in parameter list containing
> 'attr=foo' and 'attr=bar'.

Hmm, that's pretty clever. By passing the parameters to be parsed to the
getopts command instead of via $@, it is possible to modify $@ without
violating POSIX's rule that only allows parsing a single unmodified set
of parameters (unless getopts is reset via OPTIND=1).

Unfortunately, dash and FreeBSD sh reset the getopts state when the
positional parameters are modified via set or shift. They probably do
this to avoid use after free and out of bounds memory access when a
script violates POSIX's rule. However, when the parameters come from the
getopts command, there is no violation of the rule and no memory
unsafety problem.

> This works in all shells I have been able to test with the exception of
> busybox sh:

> * sh   (bash)
> * bash (All versions from 1.14 through 5.1.4)
> * mksh (MIRBSD KSH R59 2020/05/16)
> * ksh  (93u+)
> * zsh  (5.8)
> * zsh --emulate sh
> * heirloom-sh (bourne)

> The only workaround I've found is to explicitly use `shift 2` in the a)
> case and obviate the final shift using OPTIND.  This will unfortunately
> break every other shell.

Since you need code to "un-eval" a list of parameters to construct the
getopts command above in a general case, a better workaround would be to
use that code to construct the attr="$OPTARG" list.

If your actual code is more like:

set -- -a foo -a bar # for testing
while getopts :a: arg; do
...

then the script violates the rule I mentioned, and has unspecified
results.

-- 
Jilles Tjoelker


Re: [PATCH] jobs: Block signals during tcsetpgrp

2021-01-06 Thread Jilles Tjoelker
On Wed, Jan 06, 2021 at 09:16:58PM +, Harald van Dijk wrote:
> On 06/01/2021 04:45, Herbert Xu wrote:
> > This patch implements the blocking of SIGTTOU (and everything else)
> > while we call tcsetpgrp.

> > Reported-by: Steffen Nurpmeso 
> > Signed-off-by: Herbert Xu 

> > diff --git a/src/jobs.c b/src/jobs.c
> > index 516786f..809f37c 100644
> > --- a/src/jobs.c
> > +++ b/src/jobs.c
> > @@ -1512,7 +1512,13 @@ showpipe(struct job *jp, struct output *out)
> >   STATIC void
> >   xtcsetpgrp(int fd, pid_t pgrp)
> >   {
> > -   if (tcsetpgrp(fd, pgrp))
> > +   int err;
> > +
> > +   sigblockall(NULL);
> > +   err = tcsetpgrp(fd, pgrp);
> > +   sigclearmask();
> > +
> > +   if (err)
> > sh_error("Cannot set tty process group (%s)", strerror(errno));
> >   }
> >   #endif

> While this is a step in the right direction, Jilles has already replied with
> an explanation of why this is not enough: if the terminal is in TOSTOP mode,
> it's not just tcsetpgrp() that needs to be handled, it's any write as well
> that may occur while the shell is not in the foreground process group. While
> it may be working according to design for messages written when the shell is
> not supposed to be in the foreground process group, it is another story when
> the shell is both responsible for taking itself out of the foreground
> process group and for writing a message. This is made worse by the fact that
> there is no synchronisation with child processes on errors, so even forcibly
> restoring the foreground process group may not be enough: unfortunate
> scheduling may result in a child process immediately setting the foreground
> process group to the child process after the parent process attempted to
> restore it to itself. I have not yet seen a good solution for this.

Comparing this error situation to the normal case, I think the right
solution is to close any stray pipe ends we have, wait for the remaining
child processes and only then report the error (throwing an exception as
normal). The child processes will probably terminate soon because of a
broken pipe, but even if they stop, they will not change the tty
foreground process group any more. The code in jobs.c will then reset
it.

The same error handling applies to the situation where pipe() fails.
This is a bit easier to test reliably, using ulimit -n.

Adding synchronization with the child processes slows down the normal
case for a rare error case, and the synchronization objects such as
pipe, eventfd, SysV semaphore or MAP_SHARED mapping might cause
unexpected issues in strange use cases.

A related bug: if fork fails for a command substitution, the pipe
created for reading the command output remains open (two descriptors).
This one is also in dash as well as FreeBSD sh. Throwing exceptions from
forkshell() may not be the best idea.

-- 
Jilles Tjoelker


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-24 Thread Jilles Tjoelker
On Wed, Dec 23, 2020 at 08:18:24PM +, Harald van Dijk wrote:
> On 21/12/2020 16:24, Jilles Tjoelker wrote:
> > Also, simply entering the command
> >  trap "echo TTOU" TTOU
> > in an interactive shell makes it behave strangely. Created jobs
> > immediately stop, "fg" works once but after that the shell tends to get
> > stuck quickly.

> Good catch, there is some work to be done there too.

> > This seems a good approach, but certain writes to the terminal may need
> > the same treatment, for example the error message when fork fails for
> > the second and further elements of a pipeline. This also makes me wonder
> > why SIGTTOU is ignored at all by default.

> True. This is hard to create a reliable test case for, but there is only
> limited code that can get executed while a job-control-enabled shell may not
> be in the foreground process group.

An rlimit (ulimit -u) will cause fork to fail after a number of
processes, but will not work reliably if other code is executing
concurrently with the same UID.

> If fork fails halfway through a pipeline, it may also be a problem that some
> of the commands in the pipeline may still run.

This can be handled much like the case where an element in the pipeline
fails immediately such as because a utility cannot be found. I am not
sure how well this currently works.

> > In mksh, the issue is resolved slightly differently: setting a trap on
> > TTOU pretends to work but the signal disposition remains set to ignored.

> zsh also rejects traps on TTOU, but does so explicitly:

>   zsh: can't trap SIGTTOU in interactive shells

> Amusingly, it prints this in any shell where job control is enabled,
> regardless of whether the shell is interactive.

The rejection makes sense for any shell instance that has job control
and uses tcsetpgrp(), whether interactive or not.

I am not entirely happy with the rejection idea, though, since the check
can be bypassed by temporarily disabling job control:
set +m; trap 'echo TTOU' TTOU; set -m
and running external utilities then fails with:
zsh: can't set tty pgrp: interrupt

> > Tradition is for job control shells to be a process group leader, but I
> > don't really know why. Changing this will not fix the issue entirely
> > anyway since the shell must perform tcsetpgrp() from the background when
> > a foreground job has terminated.

> I've been thinking more about this, and I suspect it's a another conflation
> between interactive mode and job control. In an interactive shell that's not
> executing any external program, you want any Ctrl-C to only send SIGINT to
> that shell, not to any other process. In order for that to work, it needs to
> be its own process group.

> This should, in my opinion, *only* happen for interactive shells, not for
> job-control-enabled non-interactive shells. Consider

>   sh -c 'sh -mc "while :; do :; done"; echo bye'

> The behaviour I would want is that Ctrl-C kills the parent shell, so that
> this does not print "bye". Different shells behave differently.

I think the main effect of the -m option is that it places jobs in
separate process groups, which may also be useful in scripts. After
something like:

set -m
foo &
foopid=$!
set +m

it is possible to  kill -- "-$foopid"  .

Although non-portable kernel features such as cgroups (Linux) or reaper
(FreeBSD) might be more useful for tracking processes like this, the
work to define how to use them in a shell has not been done.

In FreeBSD sh, I extended the possibilities here by allowing job control
without an accessible tty in non-interactive mode. This applies both if
there is no controlling terminal at all and if the shell is in the
background.

In the example 
sh -c 'sh -mc "while :; do :; done"; echo bye'
the -m flag seems to have no effect since that shell only runs builtins.

Changing it to
sh -mc 'sh -c "while :; do :; done"; echo bye'
the desired Ctrl+C behaviour can still work because the outer shell
exits when it notices the inner shell has terminated because of SIGINT.

-- 
Jilles Tjoelker


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-21 Thread Jilles Tjoelker
On Sat, Dec 19, 2020 at 11:52:31PM +, Harald van Dijk wrote:
> On 19/12/2020 22:21, Steffen Nurpmeso wrote:
> > Steffen Nurpmeso wrote in
> >   <20201219172838.1b-wb%stef...@sdaoden.eu>:
> >   |Long story short, after falsely accusing BSD make of not working

> > After dinner i shortened it a bit more, and attach it again, ok?
> > It is terrible, but now less redundant than before.
> > Sorry for being so terse, that problem crosses my head for about
> > a week, and i was totally mislead and if you bang your head
> > against the wall so many hours bugs or misbehaviours in a handful
> > of other programs is not the expected outcome.

> I think a minimal test case is simply

> all:
> $(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good'

> unless I accidentally oversimplified.

Yes, and it can be simplified a bit more to remove make from the
picture:
(SHELL -c 'trap "echo TTOU" TTOU; set -m; echo all good'; :)

The idea is to start the shell without being a process group leader, set
a trap for SIGTTOU and enable job control.

Also, simply entering the command
trap "echo TTOU" TTOU
in an interactive shell makes it behave strangely. Created jobs
immediately stop, "fg" works once but after that the shell tends to get
stuck quickly.

> The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make its
> newly started process group the foreground process group when job control is
> enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's in dash, the
> other variants may have some small differences.) tcsetpgrp has this little
> bit in its specification:

>Attempts to use tcsetpgrp() from a process which is a member of
>a background process group on a fildes associated with its con‐
>trolling  terminal  shall  cause the process group to be sent a
>SIGTTOU signal. If the calling thread is blocking SIGTTOU  sig‐
>nals  or  the  process is ignoring SIGTTOU signals, the process
>shall be allowed to perform the operation,  and  no  signal  is
>sent.

> Ordinarily, when job control is enabled, SIGTTOU is ignored. However, when a
> trap action is specified for SIGTTOU, the signal is not ignored, and there
> is no blocking in place either, so the tcsetpgrp() call is not allowed.

Right.

> The lowest impact change to make here, the one that otherwise preserves the
> existing shell behaviour, is to block signals before calling tcsetpgrp and
> unblocking them afterwards. This ensures SIGTTOU does not get raised here,
> but also ensures that if SIGTTOU is sent to the shell for another reason,
> there is no window where it gets silently ignored.

This seems a good approach, but certain writes to the terminal may need
the same treatment, for example the error message when fork fails for
the second and further elements of a pipeline. This also makes me wonder
why SIGTTOU is ignored at all by default.

In mksh, the issue is resolved slightly differently: setting a trap on
TTOU pretends to work but the signal disposition remains set to ignored.

> Another way to fix this is by not trying to make the shell start a new
> process group, or at least not make it the foreground process group. Most
> other shells appear to not try to do this.

Tradition is for job control shells to be a process group leader, but I
don't really know why. Changing this will not fix the issue entirely
anyway since the shell must perform tcsetpgrp() from the background when
a foreground job has terminated.

What is definitely required, though, is that the shell not read input or
alter terminal settings if it is started in the background (possibly
unless the script explicitly ignored SIGTTOU). This is what the loop
with tcgetpgrp() does.

-- 
Jilles Tjoelker


Re: shell: Always use explicit large file API

2020-05-09 Thread Jilles Tjoelker
On Thu, May 07, 2020 at 11:42:12PM +1000, Herbert Xu wrote:
> There are some remaining stat/readdir calls in dash that may lead
> to spurious EOVERFLOW errors on 32-bit platforms.  This patch changes
> them (as well as open(2)) to use the explicit large file API.

> Reported-by: Tatsuki Sugiura 
> Signed-off-by: Herbert Xu 

> diff --git a/configure.ac b/configure.ac
> index 5dab5aa..dbd97d8 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -144,8 +144,13 @@ AC_CHECK_FUNC(stat64,, [
>   AC_DEFINE(fstat64, fstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(lstat64, lstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(stat64, stat, [64-bit operations are the same as 32-bit])
> + AC_DEFINE(readdir64, readdir,
> +   [64-bit operations are the same as 32-bit])
> + AC_DEFINE(dirent64, dirent,
> +   [64-bit operations are the same as 32-bit])
>  ])
> [snip]

Is it possible to use AC_SYS_LARGEFILE and the normal
fstat/lstat/readdir/dirent/open/etc instead of the non-standard "64"
interfaces?

I understand that dash formerly used the "64" interfaces selectively, so
that the binary could be a bit smaller by avoiding 64-bit numbers where
they were not necessary. Now that the feature "supports files with inode
numbers that do not fit in 32 bits" is considered essential, this
complexity seems unnecessary.

I'm sorry for not providing a patch.

-- 
Jilles Tjoelker


Re: [v2 PATCH] eval: Reset handler when entering a subshell

2019-03-03 Thread Jilles Tjoelker
On Sun, Mar 03, 2019 at 04:43:09PM +, Harald van Dijk wrote:
> On 03/03/2019 13:57, Herbert Xu wrote:
> > On Sat, Mar 02, 2019 at 01:28:44PM +, Harald van Dijk wrote:

> > > That is *a* real bug here, not *the* real bug. This leaves the buggy
> > > behaviour of the "command" built-in intact in the test case I included in
> > > the message you replied to.

> > I don't quite understand.  Could you explain what is still buggy
> > after the following patch?

> I gave this example in my previous message:

>   command . /dev/stdin <   set -o invalid
>   echo a
>   EOF
>   echo b

> Ordinarily, "set -o invalid" is a "special built-in utility error", for
> which the consequence is that a non-interactive shell "shall exit". If it
> weren't a special built-in utility, it would be an "other utility (not a
> special built-in) error", for which a non-interactive shell "shall not
> exit".

> The effect of the command built-in is that "if the command_name is the same
> as the name of one of the special built-in utilities, the special properties
> in the enumerated list at the beginning of Special Built-In Utilities shall
> not occur." This is ambiguous as to whether it is just about the special
> properties associated with the . command, or whether it includes those
> associated with the set command called indirectly, but it must be one or the
> other. If it includes the set command, then the shell shall not exit, and
> the correct output is a followed by b. If it does not include the set
> command, then the shell shall exit, and the correct output is nothing. dash
> outputs b, which is wrong in either case.

The way I interpret this is that any error while parsing or executing
the . or eval command(s) is a "special built-in utility error" which can
be "caught" using "command". Therefore, the correct output is an error
message about the invalid option (to stderr) followed by b. This is what
happens in FreeBSD sh (for quite a few years), ksh93 (20120801_2) and
mksh (56c).

Given that the text is somewhat ambiguous, I think bash's POSIX mode
behaviour of exiting may also be valid.

I don't like disabling the special properties of the inner set command
since this creates a different kind of script for 'command .' versus
'.'. Normally (though not in #!/bin/bash scripts), if I write something
like 'set -o pipefail', I can safely assume that it will take if the
shell continues to the next command; if the shell does not support the
option, it exits.

The fairly old dash v0.5.9.1 I tried works even differently, writing an
error message (to stderr) followed by a. So it first continues after the
error and then aborts when '.' ends. This seems clearly broken (but I
think things have changed since back then).

-- 
Jilles Tjoelker


Re: [PATCH] Use a real non-cryptographic hash algorithm

2018-11-16 Thread Jilles Tjoelker
On Thu, Nov 15, 2018 at 09:27:42AM -0500, Devin Hussey wrote:
> >From 502cbd61e386eb014035df66b032e90a9de926b7 Mon Sep 17 00:00:00 2001
> From: Devin Hussey 
> Date: Thu, 15 Nov 2018 09:12:24 -0500
> Subject: [PATCH] Use a real non-cryptographic hash algorithm

> dash previously used a modified version of the LoseLose algorithm.

> LoseLose is a TERRIBLE algorithm.

> It has terrible distribution, leaving many hash entries unused.

> However, more importantly, it is incredibly easy to make a
> collision; even something as simple as rearranging the letters.

> For example. these all produce the hash 1652:

> Hello
> Holle
> Hlleo
> Hoell
> Haxed\n
> HASH\tBAD
> Hater

> Collsions in hash tables are very bad because it requires
> looking through a linked list, and the benefits of O(1) time
> in a hash table are completely lost.

> The FNV-1a algorithm has comparable performance and code size
> while having a much better collision rate.

You do have a dependency chain with a multiplication per character, so
it is less suitable for long strings. There are implementation
techniques to avoid the dependency chain, but I'm not sure it's worth
the trouble.

> While there are some other faster and more resistant hashes out there,
> namely xxHash, Murmur2, and CityHash, the benefits are minimal on
> short strings and they expect a length argument, unlike FNV-1a which
> simply uses null-termination, and they are not in the public domain
> like FNV-1a.

> [snip]
> +hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +hashval = (hashval ^ (unsigned char) *p++) + FNV_PRIME;

I suppose this latter one should be * instead of + as well?

-- 
Jilles Tjoelker


Re: [PATCH 4/5] Stop using deprecated function sigsetmask()

2018-10-16 Thread Jilles Tjoelker
On Tue, Oct 16, 2018 at 06:42:19PM +0200, Antonio Ospite wrote:
> sigsetmask() is deprecated, at least on recent glibc; stop using it to
> silence the following compiler warning:

> ---
> system.h:40:2: warning: ‘sigsetmask’ is deprecated [-Wdeprecated-declarations]
>   sigsetmask(0);
>   ^~
> In file included from /usr/include/x86_64-linux-gnu/sys/param.h:28,
>  from shell.h:52,
>  from nodes.c:46:
> /usr/include/signal.h:173:12: note: declared here
>  extern int sigsetmask (int __mask) __THROW __attribute_deprecated__;
> ^~
> ---

> Using sigprocmask() and friends unconditionally should not be a problem,
> as commit e94a964 (eval: Add vfork support, 2018-05-19) also does it.

The git history starts (in 2005) after HAVE_SIGSETMASK was added, but I
expect it was done this way to save a few bytes in the executable. With
ProPolice, the effect may be a bit more since a local variable of type
sigset_t often contains an array and may cause functions to be compiled
with stack protection when they otherwise wouldn't (note that
sigclearmask() is inline).

Both FreeBSD and NetBSD simply changed the sigsetmask() call to a
sigprocmask() call very early on (1995-1996).

Personally, I think clean code that compiles without warnings is more
important than making the executable as small as possible, but the
maintainer may not agree.

-- 
Jilles Tjoelker


Re: [PATCH 1/6] exec: Don't execute binary files if execve() returned ENOEXEC.

2018-09-11 Thread Jilles Tjoelker
On Fri, Sep 07, 2018 at 10:34:09AM +0200, Andrej Shadura wrote:
> From: Adam Borowski 

> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.

> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.

> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution."

I think this is worth the extra code, even though POSIX does not require
it.

Neither the file_is_binary function nor its caller take care of
preserving errno, which might lead to confusing error messages.

On another note, this is not fully POSIX-compliant because it rejects
some files that comply to POSIX's definition of a text file. Per POSIX's
definition, only files containing '\0' can be rejected (or containing
too long lines or not ending with a newline, both of which seem like a
bad idea to check); this probably requires checking more than the first
line for decent detection. However, I suppose the benefits and
disadvantages were weighed here and non-compliance was decided.

> Signed-off-by: Adam Borowski 
> Signed-off-by: Andrej Shadura 
> ---
>  src/exec.c | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/src/exec.c b/src/exec.c
> index 9d0215a..631 100644
> --- a/src/exec.c
> +++ b/src/exec.c
> @@ -148,6 +148,36 @@ shellexec(char **argv, const char *path, int idx)
>  }
>  
>  
> +/*
> + * Check if an executable that just failed with ENOEXEC shouldn't be
> + * considered a script (wrong-arch ELF/PE, junk accidentally set +x, etc).
> + * We check only the first line to allow binaries encapsulated in a shell
> + * script without proper quoting.  The first line, if not a hashbang, is
> + * likely to contain comments; even ancient encodings, at least popular
> + * ones, don't use 0x7f nor values below 0x1f other than whitespace (\t,
> + * \n, \v, \f, \r), ISO/IEC 2022 can have SI, SO and \e.
> + */
> +STATIC int file_is_binary(const char *cmd)
> +{
> + char buf[128];
> + int fd = open(cmd, O_RDONLY|O_NOCTTY);
> + if (fd == -1)
> + return 1;
> + int len = read(fd, buf, sizeof(buf));
> + close(fd);
> + for (int i = 0; i < len; ++i) {
> + char c = buf[i];
> + if (c >= 0 && c <= 8 ||
> + c >= 16 && c <= 31 && c != 27 ||
> + c == 0x7f)
> + return 1;
> + if (c == '\n')
> + return 0;
> + }
> + return 0;
> +}
> +
> +
>  STATIC void
>  tryexec(char *cmd, char **argv, char **envp)
>  {
> @@ -162,6 +192,8 @@ repeat:
>   execve(cmd, argv, envp);
>  #endif
>   if (cmd != path_bshell && errno == ENOEXEC) {
> + if (file_is_binary(cmd))
> + return;
>   *argv-- = cmd;
>   *argv = cmd = path_bshell;
>   goto repeat;

-- 
Jilles Tjoelker


Re: [v2 PATCH] expand: Escape minus sign in arithmetic expansion

2018-05-31 Thread Jilles Tjoelker
On Tue, May 29, 2018 at 02:00:26AM +0800, Herbert Xu wrote:
> On Mon, May 28, 2018 at 12:22:00AM +0800, Herbert Xu wrote:
> > The minus sign generated from arithmetic expansion is currently
> > unquoted which causes anomalies when the result is used in where
> > the quoting matters.

> > This patch fixes it by explicitly calling memtodest for the minus
> > sign.

> > Signed-off-by: Herbert Xu 

> This was buggy.  Here is an update.

> ---8<---
> The minus sign generated from arithmetic expansion is currently
> unquoted which causes anomalies when the result is used in where
> the quoting matters.

> This patch fixes it by explicitly calling memtodest for the minus
> sign.

> Signed-off-by: Herbert Xu 

> diff --git a/src/expand.c b/src/expand.c
> index 7a51766..7ed1bc0 100644
> --- a/src/expand.c
> +++ b/src/expand.c
> @@ -490,7 +490,14 @@ expari(int flag)
>   result = arith(p + 1);
>   popstackmark();
>  
> - len = cvtnum(result);
> + len = 0;
> + if (result < 0) {
> + memtodest("-", 1, flag);
> + result = -result;

This negation overflows if result is INTMAX_MIN, leading to undefined
behaviour. Perhaps leave the memtodest call but use STADJUST to remove
the minus sign itself, leaving only the CTLESC or nothing.

> + len++;
> + }
> +
> + len += cvtnum(result);
>  
>   if (likely(!(flag & EXP_QUOTED)))
>   recordregion(begoff, begoff + len, 0);

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exec: Stricter pathopt parsing

2018-05-15 Thread Jilles Tjoelker
On Tue, May 15, 2018 at 04:49:51PM +0800, Herbert Xu wrote:
> This patch changes the parsing of pathopt.  First of all only
> %builtin and %func (with arbitrary suffixes) will be recognised.
> Any other pathopt will be treated as a normal directory.

> Furthermore, pathopt can now be specified before the directory,
> rather than after it.  In fact, a future version may remove support
> for pathopt suffixes.

> This is so that it is less likely that a genuine directory containing
> a % sign is parsed as a pathopt.

Some examples/testcases would make this clearer.

Note that this also affects CDPATH (good) and MAILPATH (breaks its
undocumented feature to replace the "you have mail" text, which seems
unimportant).

> The patch also removes the clearcmdentry optimisation where we
> attempt to only partially flush the table where possible.

That optimisation is not only a waste of code but also not
POSIX-compliant. XCU 2.9.1.1 Command Search and Execution is clear that
the shell is required to forget remembered locations after an assignment
to PATH, and non-normative text with the hash utility remarks that
PATH="$PATH" is an alternative to hash -r that does not need the XSI
option.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: builtin: Mark more regular built-ins

2018-05-15 Thread Jilles Tjoelker
On Tue, May 15, 2018 at 07:27:29PM +0800, Herbert Xu wrote:
> This patch marks the following built-ins as regular, meaning that
> they cannot be overriden using PATH search:

>   hash
>   pwd
>   type
>   ulimit

This seems correct, but lacks rationale. The rationale is that pwd
should have been here since long ago (because it is in the list in XCU
2.9.1.1 Command Search and Execution), while the other three are new in
SUSv4tc2.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Jilles Tjoelker
On Tue, Mar 27, 2018 at 08:19:19PM +0800, Herbert Xu wrote:
> On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote:

> > If POSIX specifies a result, and a shell applies an optimisation that causes
> > a different result to be produced, doesn't that inherently disallow that
> > optimisation? There may be some confusion and/or disagreement over what
> > exactly POSIX specifies and/or intends to specify, but I don't see how you
> > can argue that POSIX specifies result A, but it's okay to apply an
> > optimisation that produces result B.

> Wait, you're talking about something completely different.  The
> crux of the issue is whether POSIX allows the backslash to escape
> the character after it or does it have to be a literal backslash.

> In fact POSIX is perfectly clear on this:

>   2.13.1 Patterns Matching a Single Character

>   The following patterns matching a single character shall match a
>   single character: ordinary characters, special pattern characters,
>   and pattern bracket expressions. The pattern bracket expression
>   also shall match a single collating element. A 
>   character shall escape the following character. The escaping
>shall be discarded. If a pattern ends with an
>   unescaped , it is unspecified whether the pattern
>   does not match anything or the pattern is treated as invalid.

> Nowhere does it say that backslashes that come from the result of
> parameter expansion is always literal.

> So it's clear that any shell that treats the backslash as a literal
> backslash is already broken.

I don't think it is clear at all. Note the final paragraph of 2.13.1:

] When pattern matching is used where shell quote removal is not
] performed (such as in the argument to the find -name primary when find
] is being called using one of the exec functions as defined in the
] System Interfaces volume of POSIX.1-2008, or in the pattern argument
] to the fnmatch() function), special characters can be escaped to
] remove their special meaning by preceding them with a 
] character. This escaping  is discarded. The sequence "\\"
] represents one literal . All of the requirements and
] effects of quoting on ordinary, shell special, and special pattern
] characters shall apply to escaping in this context.

This implies that special characters cannot be escaped using a backslash
in a context where shell quote removal is performed. Instead, special
characters can be escaped using shell quoting. As a result, the simplest
form of parameter expansion has either all or none of the generated
characters quoted (depending on whether the expansion is in
double-quotes or not).

There is also a sentence "The shell special characters always require
quoting." which seems untrue in practice if the characters come from an
expansion. Something like

  touch 'a'
  sh -c 'w=*\&*; printf "%s\n" $w'

works for many shells as sh. However, this could be explained away as
undefined behaviour.

Furthermore, POSIX generally intends to specify the behaviour of the
Bourne shell and ksh88. If the Bourne shell and ksh88 both agree, and do
not match an interpretation of POSIX, then it is likely that the
interpretation is wrong or the standard text has a bug.

> > Can you show me any shell other than bash that lets this
> > optimisation affect the results?

> The fact is that the other shells are not doing what they do because
> they're not doing this optimisation.  They do what they do because
> they have broken POSIX because they always treat backslashes that
> arise from parameter expansion as literal backslashes.

Let's use clear terminology: if it affects behaviour, it is by
definition not an optimization. It is a special case for words not
containing special pattern characters (for some value of "special
pattern characters").

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] $PATH not fully searched

2018-01-10 Thread Jilles Tjoelker
On Wed, Jan 10, 2018 at 01:36:18PM -0500, Joshua Nelson wrote:
> I've come across an error with the PATH variable in `dash`. Instead of
> fully searching PATH for commands, dash will respond 'command not
> found' if `command -p ` fails.  The same command works
> fine in Bash.

> The following example was performed in a live cd of Debian 9.1 Stretch
> (run in a virtual machine), with dash version 0.5.8-2.4:

> ```bash
> user@debian:~$ PATH="~/my_bin:$PATH" dash
> $ echo $PATH
> ~/my_bin:/usr/local/bin:/usr/bin:/usr/local/games:/usr/games
> $ ls -l ~/my_bin/list
> -rwxr--r-- 1 user user 18 Jan 10 17:42 /home/user/my_bin/list
> $ list
> dash: 3: list: not found
> $ exit
> user@debian:~$ PATH="$PATH:~/my_bin" list
> Desktop Documents Downloads Music my_bin Pictures Public Templates Videos
> ```

> I believe but am not certain that this is related to the following patch:
> https://www.mail-archive.com/dash@vger.kernel.org/msg01329.html

In your example, the tilde is quoted and ends up literally in PATH.
Then, in bash only, tilde expansion is attempted again during the
search. In my testing, zsh, mksh, ksh93, dash and FreeBSD sh do not
implement this feature. Enabling POSIX mode in bash also disables the
feature.

What works more portably is
  PATH=~/my_bin:$PATH dash
or
  PATH=$PATH:~/my_bin list
which expands the tilde at the time of the assignment. This works pretty
much everywhere except in old real Bourne shells, which do not implement
tilde expansion at all.

If you want to quote the $PATH part, it is possible but not necessary
since it is in a variable assignment which is not subject to pathname
generation and word splitting anyway. However, if "export" is prepended,
I strongly recommend quoting it since some shells such as dash do not
implement the special rule for assignment utilities yet.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] quoted substring parameter expansion ignores single-quotes in the arg

2017-10-21 Thread Jilles Tjoelker
On Thu, Oct 19, 2017 at 12:41:30PM +0200, G.raud wrote:
> Wrong bug report.

> In fact the beahviour of dash on "a=\\'a\\'; echo \"\${a%'}\"" is that
> of POSIX and of zsh 5.  However 'bash --posix', 'mksh -o posix' and
> pdksh fail to parse the command and ksh does not remove the quote from
> $a.

I think your original bug report is actually valid. Although the POSIX
standard may be hard to interpret (make sure to use the latest (2016)
edition), I think it is sufficiently clear that various special
characters are active in ${param#word}, whether the outer substitution
is within double-quotes or not. See this sentence in 2.6.2 Parameter
Expansion:

] Enclosing the full parameter expansion string in double-quotes shall
] not cause the following four varieties of pattern characters to be
] quoted, whereas quoting characters within the braces shall have this
] effect.

More generally, if bash --posix and ksh93 agree about something, it is
usually correct (use something like "${a#'*'}", since "${a#'}" is
wrong). Although zsh is a good interactive shell, it does not follow
POSIX as closely; not even in sh or ksh emulation mode.

Also note that dash does implement this correctly if a metacharacter is
quoted using a single-quote or a backslash:

$ v=**
$ echo "${v#*}"
**
$ echo "${v#"*"}"
*
$ echo "${v#\*}"
*

In dash's internals, this bug is caused by not distinguishing between
+/-/=/? and #/##/%/%% varieties in the parser. Not maintaining state for
eacg nesting level makes the parser more elegant, but it is untenable
since there is no way one can parse both "${v+'}" and "${v#'*'}"
correctly without it.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small cleanup to src/dash.1 - Command Line Editing

2017-06-23 Thread Jilles Tjoelker
On Sat, Jun 17, 2017 at 03:53:26PM +0100, Larry Hynes wrote:
> src/dash.1, under Command Line Editing, states:

>   It's similar to vi: typing  will throw you into command
>   VI command mode.

> - There appears to be no need for both occurrences of 'command'
> - I can't see a reason for VI to be capitalised
> - 'will throw you into' seems a little... enthusiastic

> Following diff changes it to

>   It's similar to vi: typing  enters vi command mode.

> diff --git a/src/dash.1 b/src/dash.1
> index 8b8026d..f35d89d 100644
> --- a/src/dash.1
> +++ b/src/dash.1
> @@ -2232,7 +2232,7 @@ enabled, sh can be switched between insert mode and 
> command mode.
>  The editor is not described in full here, but will be in a later document.
>  It's similar to vi: typing
>  .Aq ESC
> -will throw you into command VI command mode.
> +enters vi command mode.
>  Hitting
>  .Aq return
>  while in command mode will pass the line to the shell.

I agree. If you're changing things here anyway, I suggest getting rid of
the contraction as well (changing It's to It is). The fairly formal
style of man pages avoids contractions, just like it avoids "you".

The reference to the "later document" can probably be removed as well,
since said document does not exist yet after many years.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't execute binary files if execve() returned ENOEXEC.

2017-02-08 Thread Jilles Tjoelker
On Wed, Feb 08, 2017 at 09:02:33AM +0100, Adam Borowski wrote:
> On Tue, Feb 07, 2017 at 11:52:08PM +0100, Jilles Tjoelker wrote:
> > On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> > > Both "dash -c foo" and "./foo" are supposed to be able to run 
> > > hashbang-less
> > > scripts, but attempts to execute common binary files tend to be nasty:
> > > especially both ELF and PE tend to make dash create a bunch of files with
> > > unprintable names, that in turn confuse some tools up to causing data 
> > > loss.

> > > Thus, let's read the first line and see if it looks like text.  This is a
> > > variant of the approach used by bash and zsh; mksh instead checks for
> > > signatures of a bunch of common file types.

> > > POSIX says: "If the executable file is not a text file, the shell may 
> > > bypass
> > > this command execution.".

> > In FreeBSD sh, I have done this slightly differently (since 2011), based
> > on POSIX's definition of a text file in XBD 3:

> > ] A file that contains characters organized into zero or more lines. The
> > ] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> > ] in length, including the  character.

> Using that definition would require reading the whole file, which can
> obviously be slow and/or lead to DoS.

> Also, it would reject some scripts currently accepted by popular shells,
> including bash, mksh and present version of dash -- none of which ban
> lines above LINE_MAX (2048) in length.

The part you quoted has does not require that the shell bypass the
execution (of the shell) for any file that is not a text file, but that
the shell not bypass the execution for any file that is a text file.

Therefore, the shell may (but is not required to) bypass the execution
if the file contains 0 bytes, contains invalid byte sequences that do
not form a character, contains lines longer than {LINE_MAX} or ends with
a character that is not a newline. Otherwise, the shell must perform the
execution.

The line length is indeed somewhat strange: the INPUT FILES section of
XCU 4 Utilities -> sh requires the shell to read scripts without the
{LINE_MAX} limit, but the part you quoted does not use this modified
definition of a text file. In practice this is not relevant since shells
are not going to read more than {LINE_MAX} for this check anyway, as you
say.

> > The check is simply for a 0 byte in the first 256 bytes (how many bytes
> > are read by pread() for 256 bytes). A file containing the byte 8, for
> > example, can still be a text file per POSIX's definition.

> The XBD 3 cannot possibly specify allowed byte values, as it doesn't even
> know which is the newline, nor does it require even its own "portable
> character set" being directly accessible (merely that it's included in one
> of shiftable sets provided by a locale).

> On the other hand, not only dash assumes 8-bit bytes and ASCII-compatible
> charset (what a limitation!), but we also have knowledge about which byte
> values are not a part of any printable string in any locale on a supported
> platform.  And the set of supported character sets is going to only
> decrease[1].

Text files need not contain printable characters only.

> > This check might cause a terse script with binary to fail to execute,
> > but I have not received bug reports about that.

> In today's brave "curl|sudo sh" world, it's a concern I wouldn't dismiss.
> The first line will be a legitimate command to the shell, the rest is often
> arbitrary binary junk.

I think problems are unlikely because most scripts use the #! method
instead of relying on this shell feature.

> > Stopping the check with a \n will cause a PNG header to be considered
> > text.

> Good point.  It does fool bash and mksh (with mksh's different approach)
> too.  On the other hand, PNG files are not something that's likely to have a
> bogus +x permission, and unlike PE or ELF files, empirically I didn't notice
> any nasty results.  It's still bad -- not sure what's the best solution.

Anything can have a bogus +x permission on FAT, NTFS and other
filesystems that do not support executable permissions at all.

> > Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
> > read the data, in order to minimize the potential for modifying things
> > by reading.

> The manpage for open(2) on Linux says that, while currently O_NONBLOCK has
> no effect for regular files and block devices, the expected semantics might
> eventually be implemented -- and I guess you're not going to prefault the
> beginning of the file before reading, are you?  Thus, O_NONBLOCK is a bad
> idea.

Hmm, I doubt such a b

Re: why does dash save, dup, and restore redirected descriptor in the parent, rather than redirect in the child?

2017-02-03 Thread Jilles Tjoelker
On Tue, Jan 31, 2017 at 06:27:00PM -0800, Parke wrote:
> On Tue, Jan 31, 2017 at 12:00 PM, Mark Galeck <mark_gal...@pacbell.net> wrote:
> > This is strange. Why save, dup and dup again to restore, descriptors
> > in the parent, when it would be much simpler to just dup in the
> > child, and not have to save and restore. This is simpler and I
> > checked it works the same:

> > I am sure there must be a good reason and I am not understanding
> > something deeper. What is it?

> I am not a dash developer, but one reason to make system calls in the
> parent is that it is much simpler to handle errors in the parent.

> In your example:

> > if (!fork()) {
> > fd = open64("foobar.txt", O_WRONLY|O_CREAT);
> > dup2(fd, 1);
> > execl("/bin/date", "date", (char *)NULL);
> > }

> What happens if open64 fails?  How should the child inform the parent
> of this specific error?

In general, you are right, but in the shell's case there is no
difficulty. The error is reported via an error message to stderr and a
non-zero exit status of the command, and the child process can easily do
those things.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] trap: fix memory leak in exitshell()

2016-11-22 Thread Jilles Tjoelker
On Mon, Nov 21, 2016 at 10:40:52PM +0100, Andreas Bofjall wrote:
> After dash had executed the exit trap handler, the trap was reset but
> the pointer was never freed. This leak can be demonstrated by running
> dash through valgrind and executing the following shell script:

>   foo() {
>   true
>   }
>   trap foo EXIT

> Fix by properly freeing the trap pointer in exitshell().

> Signed-off-by: Andreas Bofjall <andr...@gazonk.org>
> ---
>  src/trap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/trap.c b/src/trap.c
> index edb9938..5418b07 100644
> --- a/src/trap.c
> +++ b/src/trap.c
> @@ -389,6 +389,7 @@ exitshell(void)
>   trap[0] = NULL;
>   evalskip = 0;
>   evalstring(p, 0);
> + ckfree(p);
>   }
>  out:
>   /*

This patch will shut up valgrind in the common case, but does not handle
the general case. The command string may contain an error or invoke the
exit builtin and in either case the command string will be leaked
(SIGINT might be expected to have a similar effect, but behaves
strangely from an EXIT trap in dash).

You can probably use the exception handling already present in the
function to fix this. Note that ckfree() should only be used while
INTOFF is in effect, both to avoid longjmp'ing out of free() and to
ensure exactly one free in the presence of interruptions and errors.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] regression in builtin echo

2016-09-02 Thread Jilles Tjoelker
On Fri, Sep 02, 2016 at 10:19:12PM +0800, Herbert Xu wrote:
> On Fri, Sep 02, 2016 at 04:16:31PM +0200, Jilles Tjoelker wrote:

> > Unlike Harald van Dijk's patch, the above patch breaks \c. Per POSIX
> > (XSI option), \c shall cause all characters following it in the
> > arguments to be ignored (so not only in the argument where \c occurs).
> > For example:
> >   echo 'a\cb' c; echo d
> > shall write "ad" followed by a newline.

> Works for me:

> $ build/src/dash -c "echo 'a\cb' c; echo d"
> ad
> $ 

> AFAICS my patch doesn't change \c handling at all.  When we hit
> \c print_escape_str will return 0x100, which guarantees that we
> hit the berak.

You are right. The code is very tricky and my analysis without running
the code was wrong.

I think developing a shell is hard enough already without making the
code so tricky, though.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parameter expansion, patterns and fnmatch

2016-09-02 Thread Jilles Tjoelker
On Fri, Sep 02, 2016 at 10:04:37PM +0800, Herbert Xu wrote:
> Harald van Dijk <har...@gigawatt.nl> wrote:
> > Yes, this looks like a bug in dash. With the default --disable-fnmatch 
> > code, when dash encounters [ in a pattern, it immediately treats the 
> > following characters as part of the set. If it then encounters the end 
> > of the pattern without having seen a matching ], it attempts to reset 
> > the state and continue as if [ was treated as a literal character right 
> > from the start. The attempt to reset the state doesn't look right, and 
> > has been like this since at least the initial Git commit in 2005.

> pdksh exhibits the same behaviour:

> $ pdksh -c 'foo=[abc]; echo ${foo#[}'
> [abc]
> $

> POSIX says:

> 9.3.3 BRE Special Characters

> A BRE special character has special properties in certain contexts.
> Outside those contexts, or when preceded by a backslash, such a
> character is a BRE that matches the special character itself. The
> BRE special characters and the contexts in which they have their
> special meaning are as follows:

> .[\
> The period, left-bracket, and backslash shall be special except
> when used in a bracket expression (see RE Bracket Expression). An
> expression containing a '[' that is not preceded by a backslash
> and is not part of a bracket expression produces undefined results.

I think this interpretation of POSIX is incorrect. This is about shell
patterns, not basic regular expressions. Shell patterns are specified in
XCU 2.13 Pattern Matching Notation. In XCU 2.13.1, it is written:

] [
] If an open bracket introduces a bracket expression as in XBD Section
] 9.3.5, except that the  character ('!') shall
] replace the  character ('^') in its role in a non-matching
] list in the regular expression notation, it shall introduce a pattern
] bracket expression. A bracket expression starting with an unquoted
]  character produces unspecified results. Otherwise, '['
] shall match the character itself.

Therefore, pdksh is wrong and the output should be abc].

It is normally better to test against the actively developed mksh
instead of pdksh, but here mksh has the same bug. OpenBSD's ksh also has
some active development but stays closer to the original pdksh.

> > This also affects

> > case [a in [?) echo ok ;; *) echo bad ;; esac

> > which should print ok.

> Even ksh prints bad here.

I think POSIX may be saying something different here from what it really
wants to say. There is text in 2.13.3 Patterns Used for Filename
Expansion that leaves unspecified whether [? matches only the literal
filename component [? or all two-character filename components starting
with [ (other slash-separated components in the same pattern are
unaffected). However, if ksh93 behaves similarly in a case statement,
that may have been what the standard had intended to say.

Looking at as simple code as possible, this seems, however, unhelpful.
Since a pattern like *[ should match the literal string *[ in the choice
where brackets that do not introduce a bracket expression are supposed
to disable other special characters and any earlier work on the * is
therefore wrong, implementing this choice requires an additional scan
for brackets that do not introduce a bracket expression.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: declaration utilities (was: [bug-diffutils] bug#24116: [platform-testers] new snapshot available: diffutils-3.3.50-0353)

2016-08-23 Thread Jilles Tjoelker
[Cc trimmed]
On Fri, Aug 05, 2016 at 07:15:02PM +0200, Harald van Dijk wrote:
> On 05/08/2016 18:21, Eric Blake wrote:
> > On 08/05/2016 07:13 AM, Harald van Dijk wrote:
> >> Unfortunately, POSIX currently requires the export command to not have
> >> any magic quoting, and any POSIX-conforming shell will make

> >> a="b c=d"
> >> export a=$a

> >> set a to b, and c to d. Not so with bash, but that's because bash simply
> >> isn't POSIX-conforming, even if invoked as sh.

> > Not quite so fast.

> >> POSIX will require special quoting rules for the export command in the
> >> future, similar to what bash does today. When it does, dash is likely to
> >> change to match that, and the local command will likely be changed to
> >> work the same way.

> > POSIX has already issued an interpretation request, and therefore, any
> > CURRENT implementations (including bash) that already behave as
> > permitted by the interpretation ARE conforming. [...]

> > http://austingroupbugs.net/view.php?id=351

> An interpretation request doesn't automatically mean that all current 
> implementations are conforming, does it? It only means that when the 
> interpretation response says so. In this case, the response says that 
> the standard does not allow bash/ksh's behaviour.

> http://austingroupbugs.net/view.php?id=351#c865:

> > Interpretation response
> > 
> > The standard states that the current behavior of ksh93 does not
> > conform, and conforming implementations must conform to this.
> > However, concerns have been raised about this which are being
> > referred to the sponsor.

> The fact that this is seen as a defect in the standard which will be 
> fixed, rather than a defect in bash/ksh, doesn't make bash/ksh conform, 
> it merely means that bash/ksh shouldn't be changed to conform.

Although it is certainly true that the current standard forbids special
quoting rules for export and readonly, making this change sooner rather
than later will be helpful to users.

Looking at ChangeLog.O, I can see dash had special quoting rules for
export, readonly and local between 1997 and 2001. They were removed,
basically, because there was no specification guidance. The
specification guidance is now available in the form of Austin group bug
#351.

Users of FreeBSD sh have likewise liked having these new rules
available.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] histedit: Remove non-glibc fallback code

2016-08-08 Thread Jilles Tjoelker
On Thu, Aug 04, 2016 at 05:59:08PM +0200, Jilles Tjoelker wrote:
> >From 23da600dcff616662a93f307420d9142598e2cae Mon Sep 17 00:00:00 2001
> From: Jilles Tjoelker <jil...@stack.nl>
> Date: Thu, 4 Aug 2016 17:51:12 +0200
> Subject: [PATCH 1/2] [HISTEDIT] Stop depending on getopt reset feature.

> Instead, use our own nextopt() function.
> ---
>  src/histedit.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/src/histedit.c b/src/histedit.c
> index 94465d7..ec45065 100644
> --- a/src/histedit.c
> +++ b/src/histedit.c
> @@ -214,13 +214,8 @@ histcmd(int argc, char **argv)
>   if (argc == 1)
>   sh_error("missing history argument");
>  
> -#ifdef __GLIBC__
> - optind = 0;
> -#else
> - optreset = 1; optind = 1; /* initialize getopt */
> -#endif
>   while (not_fcnumber(argv[optind]) &&
> -   (ch = getopt(argc, argv, ":e:lnrs")) != -1)
> +   (ch = nextopt(":e:lnrs")) != '\0')
>   switch ((char)ch) {
>   case 'e':
>   editor = optionarg;

This is clearly wrong; not_fcnumber() should be passed *argptr instead
of something bogus depending on optind.

The fixed version is what FreeBSD sh has as of SVN r240541 but I have
not tested it in dash.

In any case, a side effect of this change is a small code size
reduction.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shell: dash - large file support

2015-07-26 Thread Jilles Tjoelker
On Fri, Jul 24, 2015 at 02:50:48PM -0400, Aleksandar Ristovski wrote:
 For builds that build 32-bit dash, configure misses to setup large
 file support resulting in issues with large files.

 For example:
 ...dp/dash-0.5.8/build$ ls -l /tmp/largefile.sh
 -rw-rw-r-- 1 aristovski aristovski 3794653588 Jul 24 14:12 /tmp/largefile.sh
 ...dp/dash-0.5.8/build$ ./src/dash /tmp/largefile.sh
 ./src/dash: 0: Can't open /tmp/largefile.sh

 Where dash was configured (on a 64-bit platform) like so:

 ...dp/dash-0.5.8/build$ CFLAGS=-m32 ../configure  make -j8

 Now, if I make the change in configure.ac and reconfigure the project,
 I get proper operation:

 ...0.5.8.patched/build$ ./src/dash /tmp/largefile.sh
 Running...
 ...

 (the actual working of the 'largefile.sh' is removed for brevity)

 Patch:

 --- dash-0.5.8/configure.ac 2014-09-28 04:19:32.0 -0400
 +++ dash-0.5.8.patched/configure.ac 2015-07-24 14:41:05.855055430 -0400
 @@ -4,6 +4,9 @@ AC_CONFIG_SRCDIR([src/main.c])
 
  AC_CONFIG_HEADERS(config.h)
 
 +dnl On 32-bit builds, check for large file support
 +AC_SYS_LARGEFILE
 +
  dnl Checks for programs.
  AC_PROG_CC
  AC_GNU_SOURCE

If I were the maintainer, I would use AC_SYS_LARGEFILE and remove all
complexity related to using largefile interfaces in some situations and
non-largefile interfaces in other situations.

However, the maintainer seems to prefer using largefile interfaces only
when needed, apparently to reduce the size of the binary. In the case of
large shell scripts, this is POSIXly correct (see XCU 1.5 Considerations
for Utilities in Support of Files of Arbitrary Size).

However, using a non-largefile stat() or lstat() is always wrong, since
the inode number may not fit in a non-largefile ino_t. This happens in
various cases: test -ef/-nt/-ot, finding a dot script (. special
builtin) and some related to the current directory.

On another note, test -nt/-ot should really compare the nanoseconds
parts as well.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getopts doesn't properly update OPTIND when called from function

2015-06-01 Thread Jilles Tjoelker
On Mon, Jun 01, 2015 at 07:30:46PM +0200, Harald van Dijk wrote:
 On 01/06/2015 08:29, Herbert Xu wrote:
  On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote:

  But the test script in this thread does invoke getopts with
  parameters that are the same in all invocations, and without
  modifying OPTIND. I don't see anything else in the normative
  sections that would make the result undefined or unspecified either.
  I do think the script is valid, and the results in dash should match
  those of other shells.

  The bash behaviour makes it impossible to call shell functions
  that invoke getopts while in the middle of an getopts loop.

  IMHO the dash behaviour makes a lot more sense since a function
  always brings with it a new set of parameters.  That plus the
  fact that this behaviour has been there since day one makes me
  reluctant to change it since the POSIX wording is not all that
  clear.

 True. Given that almost no shell supports that anyway, there can't be 
 too many scripts that rely on it, but I did warn about the risk of 
 breaking another type of existing scripts as well, I agree that's a real 
 concern.

FreeBSD sh inherits similar code from ash and likewise has per-function
getopts state. Various shell scripts in the FreeBSD base system use
getopts in functions without setting OPTIND=1.

 One thing that doesn't really make sense, though: if the getopts 
 internal state is local to a function, then OPTIND and OPTARG really 
 should be too. Because they aren't, nested getopts loops already don't 
 really work in a useful way in dash, because the inner loop overwrites 
 the OPTIND and OPTARG variables. While OPTARG will typically be checked 
 right at the start of the loop, before any inner loop executes, OPTIND 
 is typically used at the end of the loop, in combination with the shift 
 command. The current behaviour makes the OPTIND value in that case 
 unreliable.

First, note that the OPTARG and OPTIND shell variables are not an input
to getopts, except for an assignment OPTIND=1 (restoring an OPTIND local
at function return does not reset getopts), and that getopts writes
OPTIND no matter whether getopts's internal optind changed in this
invocation.

With that, the value of OPTIND generally used in scripts is not
unreliable. OPTIND is generally only checked after getopts returned
false (end of options), in the sequence
  while getopts ...; do
...
  done
  shift $((OPTIND - 1))

 So either way, I think something should change. But if you prefer to get 
 clarification first about the intended meaning of the POSIX wording, 
 that certainly seems reasonable to me.

I think the POSIX wording is clear enough, but it may not be desirable
to change getopts to work that way.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possibly wrong handling of $_?

2014-12-25 Thread Jilles Tjoelker
On Wed, Dec 24, 2014 at 12:33:32AM +0100, Vadim Zeitlin wrote:
 On Wed, 24 Dec 2014 00:21:09 +0100 Harald van Dijk har...@gigawatt.nl wrote:

 HvD On 23/12/2014 23:34, Vadim Zeitlin wrote:
 HvDHello,
 HvD 
 HvDI'm not exactly sure if this is a bug because I didn't find any
 HvD  specification about how is this supposed to behave (to my surprise it
 HvD  turned out that $_ was not in POSIX), but please consider this:
 HvD 
 HvD % zsh -c 'echo -n foo  echo $_'
 HvD foofoo
 HvD % bash -c 'echo -n foo  echo $_'
 HvD foofoo
 HvD % dash -c 'echo -n foo  echo $_'
 HvD foo/usr/bin/dash
 HvD 
 HvD This does come across as somewhat confusing, but $_ is really not a 
 HvD special variable at all in dash.

  Ah, this does explain it, thanks!

Dash does implement $_, but only in interactive mode.

Your selection of shells makes it appear as if dash is the odd one out,
but in fact it is not. FreeBSD /bin/sh (also an Almquist shell
derivative), mksh and ksh93 also only implement $_ in interactive mode.
Details of how $_ differ as well, particularly in ksh93. It seems unwise
to use this feature in scripts.

 HvD If dash did something special with $_, then I agree it would be nice if 
 HvD it would be somewhat compatible with other shells. If dash simply does 
 HvD not implement a feature, that feature is not required by any standard, 
 HvD and that feature is not widely used, then I suspect there won't be a lot 
 HvD of interest in implementing that feature.

  Yes, I understand, somehow the idea that dash didn't implement it at all
 just didn't occur to me, but clearly adding a non-standard new feature is
 not nowhere near as important as fixing [what looked like] a bug. I'm not
 sure about the not widely part, but I don't have any non-anecdotal
 evidence one way or the other.

 HvD Don't be put off by that, though. You are free, of course, if you feel 
 HvD so, to attempt to convince people $_ is an important feature that all 
 HvD shells should implement. If you have compelling use cases, if it solves 
 HvD real problems, and if many other shells already implement it, you might 
 HvD even get it standardised. I have never seen a need for it, but that's 
 HvD just me speaking from personal experience, others may feel differently.

  FWIW my initial problem started with using

   builddir := $(shell mkdir -p some-long-make-expression  echo $_)

 in a makefile, which seemed like a nice way to assign the value to the
 variable only if the directory was successfully created. This can, of
 course, be done in a myriad of other ways, but this one just seemed like
 the most elegant to me. Whether this counts as a need or not is not for
 me to say.

  Personally I'd say the main argument for adding support for $_ to dash
 would be to avoid mysterious problems like the one I just had because dash
 silently (i.e. without giving any errors) behaves differently from the
 other shells, which in my case resulted in the makefile misbehaving only
 under Debian (where dash is used as /bin/sh) but not under systems.

Such a principle of least surprise is not a design goal for dash.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exec command and error checking

2014-01-28 Thread Jilles Tjoelker
On Tue, Jan 28, 2014 at 02:17:59PM +0100, Guido Berhoerster wrote:
 * Марк Коренберг socketp...@gmail.com [2014-01-28 13:16]:
  $ dpkg -l | fgrep dash
  ii  dash   0.5.7-2ubuntu2
  POSIX-compliant shell

  $ exec 9no_such_file  echo TEST
  dash: 1: cannot open no_such_file: No such file

  $ exec 9no_such_file || echo TEST
  dash: 2: cannot open no_such_file: No such file

  So, I cannot test this operation without using $?

 No, exec is a special built in and POSIX specifies that
 ...if a redirection error occurs (see Consequences of Shell
 Errors), the shell shall exit with a value in the range 1-125

 dash correctly exits with exit status of 2 as it should. ksh93,
 mksh, and pdksh do the same.

Indeed, this is correct.

You can avoid the exit by prepending 'command':
$ command exec 9no_such_file || echo TEST
dash: 1: cannot open no_such_file: No such file
TEST

  in BASH this works as expected (even in sh mode)

 That's either a bug or an intended deviation from the POSIX
 standard, you'll have to ask on the bug-bash list about that.

The inconsistency appears to be in the behaviour on fatal errors in
interactive shells. Strictly speaking, POSIX seems to require that the
shell continue execution with the next command, setting $? to a non-zero
value. Historically, behaviour has instead been to exit with a non-zero
status (if in a subshell) or return to the prompt with $? set to a
non-zero value (if in the top level shell). Dash implements the latter
and I think it is more useful.

Note that the two behaviours are indistinguishable if a single simple
command is entered.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: bug? Spawned childs always remain in zombie state

2013-08-23 Thread Jilles Tjoelker
On Fri, Aug 23, 2013 at 08:44:44PM +1000, Herbert Xu wrote:
 On Sun, Jan 20, 2013 at 12:04:04PM +, Sjon Hortensius wrote:
  Hi. I'm trying to create a script which monitors a directory using
  inotify and spawns a background process for all events. However I
  found that all childs will remain in zombie state until the script
  quits and I am unable to find a proper fix.

  A minimal testcase:

  #!/bin/dash
  while true
  do
  sleep 1 
  #jobs /dev/null
  done

  If you open a second terminal you'll see that all the 'sleep'
  processes end up being defunct. I have tried playing with `set -ma`
  but the only workaround I found is the commented 'jobs' line.
  Uncommenting that line will result in expected behavior where childs
  are properly reaped. Is this a bug, or is there an alternative
  solution I'm missing?

 You need to wait on them as otherwise dash has to keep them around
 in case you call wait(1) later on.

The shell need not remember the processes indefinitely. Per POSIX (XCU
2.9.3.1 Asynchronous Lists), the application must reference $! before
starting another asynchronous list if it wants to use the wait builtin
for that particular process later on. (Note that this means that jobs -p
is not good enough even when the job consists of a single process, and
that printing $! can add memory leaks to a script.)

POSIX also restricts an operandless wait builtin to known process IDs.
This seems inappropriate: there are many scripts that start multiple
asynchronous lists without referencing $! and expect operandless wait to
wait for all of them. Therefore, all jobs must be remembered while they
are running. However, if $! was not referenced for them and they are not
the most recent asynchronous list, they can be discarded when they
terminate. I implemented this in FreeBSD sh and it appears to work well.

An additional issue occurs when multiple asynchronous lists are started
without a foreground process, here-document that requires a fork,
operandless jobs builtin or wait builtin in between. In that case, dash
never calls wait3() and zombies accumulate. Although the example could
be considered a fork bomb and relies on the child processes getting CPU
time often enough, this may legitimately happen if the loop contains a
read builtin. In FreeBSD sh, I added a check for zombies before forking
the first process of a background job. Some other shells call waitpid()
or similar from a SIGCHLD handler; this reaps zombies faster at the cost
of more complex code (signal handler performing non-trivial work).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] implement privmode support in dash

2013-08-22 Thread Jilles Tjoelker
On Thu, Aug 22, 2013 at 09:59:36PM +0200, Harald van Dijk wrote:
 On 22/08/13 19:59, Tavis Ormandy wrote:
  Hello, this is a patch to add privmode support to dash. privmode attempts to
  drop privileges by default if the effective uid does not match the uid. This
  can be disabled with -p, or -o nopriv.

 Your approach definitely has my support (FWTW), but there are two
 aspects that surprised me, and are different from bash and FreeBSD's sh:

 You named the option nopriv, while bash and FBSD use the name
 privileged. I think it is likely to confuse people if bash -o
 privileged and dash -o nopriv do the same thing, and that it would be
 better to match bash and give the option a positive name, such as
 priv, or perhaps even match them exactly and use privileged.

I think there is no reason to deviate from other shells here. Therefore,
please call it privileged.

 In bash and FBSD, after starting with -p, set +p can be used to drop
 privileges. With your patch, dash accepts set +p, but silently ignores it.

 How does something like the attached, to be applied on top of your
 patch, look?

 [snip]
 + if (!on  (uid != geteuid() || gid != getegid())) {
 + setuid(uid);
 + setgid(gid);
 + /* PS1 might need to be changed accordingly. */
 + choose_ps1();
 + }
 +}

This code tries to use setuid() and setgid() to drop all privilege,
which is only correct if the privilege to be dropped is UID 0, or on BSD
systems. It would be better to use setresuid() or setreuid(), and change
the GID before changing the UID.

Apart from that, it is better to check the return value from setuid()
and similar functions. In particular, some versions of Linux may fail
setuid() for [EAGAIN], leaving the process running with the same
privileges.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wait and ctrl+Z

2012-05-03 Thread Jilles Tjoelker
On Fri, May 04, 2012 at 12:28:17AM +0200, Marc Glisse wrote:
 On Fri, 4 May 2012, Herbert Xu wrote:
  Marc Glisse marc.gli...@inria.fr wrote:
  Hello,

  I noticed a strange behavior of wait when I suspend and resume a script.

  $ cat a.sh
  #!/bin/dash
  (sleep 7; echo blah) 
  (sleep 7; echo bloh) 
  wait ; echo coucou
  $ ./a.sh
  ^Z
  zsh: suspended  ./a.sh
  $ fg
  [1]  + continued  ./a.sh
  coucou
  $ blah
  bloh

  As you can see, the instruction after wait was executed immediatly on
  resume, without waiting for the jobs.

  If I replace the ';' after wait by  and do the same suspend+resume,
  coucou is never printed.

  I am using dash version 0.5.7-3 in debian testing.

  That's normal as wait was interrupted by a signal.  If you want
  to wait even after an interruption, you should check the return
  value of wait.

 Hello, and thanks for you answer.

 I find that quite surprising. I re-read the posix description of wait, and 
 my understanding is that the return value of wait should depend on what 
 happened to the waited process (exit code, signal), not to wait itself. 
 And other shells seem to agree.

This is not actually said in the XCU 'wait' page but in XCU 2.11 Signals
and Error Handling.

However, it says something subtly different: only a signal for which a
trap has been set should cause 'wait' to return immediately with an exit
status greater than 128.

Because no trap has been set on SIGTSTP, 'wait' should not be
interrupted here and the shell should continue waiting.

Likewise, if the shell internally uses SIGCHLD to get notified about
process termination, this does not interrupt 'wait'; dash implements
that aspect properly.

 Are you suggesting that wait should always be used in a loop? With what 
 check exactly?

Only if you have set any traps that resume execution of the original
script (i.e. do not exit the process).

Otherwise, if 'wait' is being called without parameters, you can do
something like
  until wait; do :; done

If 'wait' is being called with parameters, the required loop is very
complicated.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] var.c: check for valid variable name before printing in export -p

2012-02-25 Thread Jilles Tjoelker
On Sat, Feb 25, 2012 at 06:36:36PM +1100, Herbert Xu wrote:
 On Tue, Feb 14, 2012 at 10:48:48AM +, har...@redhat.com wrote:
  From: Harald Hoyer har...@redhat.com

  export -p prints all environment variables, without checking if the
  environment variable is a valid dash variable name.

  IMHO, the only valid usecase for export -p is to eval the output.

 Thanks a lot for the report and patch.

 I'd prefer to fix this up at entry into the shell rather than
 when we print out the variables.  So how about this patch?

Such a change would change other things than just set and export -p.
It would also not propagate environment variables with invalid names to
child processes. For example:

env -i not-a-name=1 PATH=$PATH dash -c env | grep not-a-name

Most shells pass the environment variable through, such as bash, zsh,
ksh93 and most ash derivatives. However, the original Bourne shell and
pdksh/mksh do not.

I think it is best to pass them through so that executing a utility with
and without the shell are as similar as possible (most versions of
make(1) assume this is the case by skipping the shell if the command is
simple enough) and particularly for dash for historical/compatibility
reasons.

I did something similar to the OP's patch in FreeBSD (SVN r223183):
set and export -p skip variables with invalid names.

Note that this problem cannot occur with readonly -p because only
variables with a proper name can be marked read-only.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question about job control in non-interactive shells

2012-01-13 Thread Jilles Tjoelker
On Mon, Jan 09, 2012 at 10:37:45AM -0600, Jonathan Nieder wrote:
 Michael Welsh Duggan wrote:
  I am trying to determine why:

dash -c sleep 5  kill %1

  results in:

dash: 1: kill: No such process

 You are probably looking for the -m option.

The cause is that the -m option (job control) enables running commands
in separate process groups, and dash follows literally what POSIX says
about kill %job: a background process group should be signaled; however,
there is no background process group. Some shells signal one or more
processes they know are part of the job in this case, but dash calls
kill() on a process group that is guaranteed not to exist.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: evaluation of env variables in DASH

2011-10-19 Thread Jilles Tjoelker
On Wed, Oct 19, 2011 at 11:24:50PM +0200, Dima Sorkin wrote:
   The following DASH behaviour seems buggy to me

 -
 $ export A='\n'
 $ echo $A
 
 $
 -

 whereas using BASH would result in printing literally \n .

 I.e. presumingly DASH does secondary interpolation of escaped
 symols in values of environment variables. 

The echo builtin in dash differs from most other echo utilities on Linux
and *BSD in interpreting System V-like backslash escape sequences. This
is the expansion you are seeing and the same thing can be seen in
  echo '\n'
It is documented in dash's manual page.

This also happens in bash if you 'shopt -s xpg_echo'.

This behaviour is permitted by POSIX and required for the XSI option,
and is more commonly seen on Solaris or other System V derivatives.

The fix is to use printf(1) instead of echo(1) if there is a possibility
the string may start with '-' or contain '\'. In this case,
  printf '%s\n' $A

This has been asked/reported more frequently and the answer has been
that it will not be changed.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'continue' does not work in files sourced with dotcmd

2011-07-10 Thread Jilles Tjoelker
On Sat, Jul 09, 2011 at 10:07:30PM +0800, Herbert Xu wrote:
 On Sat, Jul 09, 2011 at 03:07:04PM +0200, Jilles Tjoelker wrote:
  A fix for dash is below. The dash code is broken in a different way than
  the FreeBSD sh code was, but the patched code is pretty much the same.
  This makes the above test work and does not change the outcome of any
  other tests in the FreeBSD sh testsuite.

 You're right.  But I think your patch may introduce a problem
 with a return statement inside a dot script.  That should not
 have an effect after exiting the script.

Interesting. Dash has been returning from the closest scope (function or
sourced script) for a while, but the SKIPFUNC/SKIPFILE distinction and
the comment in eval.c returncmd()

]   /*
]* If called outside a function, do what ksh does;
]* skip the rest of the file.
]*/

still gave me the impression that it behaved like older ash (also in
FreeBSD and NetBSD), trying to be bug-compatible with the Bourne shell
by having 'return' return from a function, if any, and only return from
a dot script if there is no function (because the Bourne shell gives an
error in that case).

It may be better to name the constant SKIPRETURN rather than SKIPFUNC.

 Anyway, the following patch based on your idea should fix the
 problem.

 commit 4f7e206782675b548565ca2bc82bc8c262a0f20e
 Author: Herbert Xu herb...@gondor.apana.org.au
 Date:   Sat Jul 9 22:05:22 2011 +0800
 
 [BUILTIN] Merge SKIPFUNC/SKIPFILE and only clear SKIPFUNC when leaving 
 dotcmd

Yes, this works too.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Porting dash to OpenBSD

2011-07-03 Thread Jilles Tjoelker
On Sun, Jun 26, 2011 at 06:18:59AM +0400, Mike Korbakov wrote:
 Unfortunatly, not all systems (like OpenBSD) has built-in texttools
 like nl, that makes it difficult to port dash, with minimum troubles.

 To reduce dependencies, I suggest using instead of nl other tools like
 awk or cat (as advised openbsd porters team).

 Building dash-0.5.6.1 was successfull with this patch:

 --- src/mkbuiltins.orig Sat Jun  5 13:34:23 2010
 +++ src/mkbuiltins  Sun Jun 26 02:36:23 2011
 @@ -84,7 +84,7 @@ cat \!
   */
 
  !
 -sed 's/-[a-z]*//' $temp2 | nl -v 0 | LC_COLLATE=C sort -u -k 3,3 |
 +sed 's/-[a-z]*//' $temp2 | awk '{ printFNR-1$0 }'  | 
 LC_COLLATE=C sort -u -k 3,3 |
  tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
 awk '{  printf #define %s (builtincmd + %d)\n, $3, $1}'
  printf '\n#define NUMBUILTINS %d\n' $(wc -l  $temp2)

This looks reasonable. It works here (FreeBSD with somewhat hacked dash
source), leaving the resulting binary unchanged.

A minor nit: comparing with other awk invocations, it seems more usual
to use NR rather than FNR. They both do the same thing if there is only
one input file.

The nl utility has caused more portability problems already; the code in
master has different options. This is so even though nl is in SUSv4
under the XSI option. We already use awk so that should be safer.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shift fatal error

2011-03-15 Thread Jilles Tjoelker
On Thu, Mar 10, 2011 at 08:23:33PM +0100, Guido Berhoerster wrote:
 * Dan Muresan danm...@gmail.com [2011-03-10 19:41]:
  Hi, is there some consensus on whether shift should cause a fatal
  error as reported by Herbert against bash:
  
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=252378
  
  # doesn't print anything
  dash -c 'shift 2; echo hi'
  
  My copy of SUSv3 doesn't seem to imply any fatal error handling
  requirement for shift:
  
  --- 8-X ---
  EXIT STATUS
  
  The exit status is 0 if n$#; otherwise, it is zero.
  
  CONSEQUENCES OF ERRORS
  
  Default.
  --- 8-X ---

 For IEEE Std 1003.1-2008 see section 2.8.1 Consequences of
 Shell Errors: a utility syntax error (option or operand error)
 with special built-ins shall cause the shell to exit. That's what
 dash, ksh93, and pdksh do.

That may have been the intention (considering that it matches the real
Bourne shell in addition to various flavours of Korn shell), but that is
not how I would interpret it. A too high shift count still seems
syntactically valid to me.

A statement about the exit status in a particular error situation also
usually indicates that the shell shall not abort.

Examples of shift commands that I think shall cause the shell to abort:
shift -S  # unless a -S option is supported as an extension
shift x   # unless arithmetic expressions are accepted as an extension
shift @   # unless there is some strange extension

The FreeBSD sh shift special builtin behaves this way (a too high shift
count is not fatal but a syntactically invalid number is). This was done
a few years ago when the syntax-error property of special builtins was
implemented, as making a too high shift count fatal caused a configure
script in the FreeBSD base system to fail.

I would not encourage the extension of accepting arithmetic expressions
because that requires a subtly different kind of arithmetic expression
without octal constants. POSIX is clear that 'shift 010' should shift
ten positions, not eight, while 'shift $((010))' should shift eight
positions.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read() builtin doesnt read integer value /proc files (but bashs does)

2010-12-18 Thread Jilles Tjoelker
On Wed, Dec 15, 2010 at 12:55:51PM -0600, Jonathan Nieder wrote:
 Cristian Ionescu-Idbohrn wrote:
  On Sun, 28 Nov 2010, Herbert Xu wrote:
   On Sat, Sep 04, 2010 at 07:35:04PM +, Jilles Tjoelker wrote:

  This discarding is still bad as it throws away valid data if the open
  file description is shared. This happens if stdin is redirected inside a

  I'm with Jilles on this.  I also don't particularly feel like
  bloating dash just because of the borked /proc interface when
  there is a perfectly adequate work-around in cat.

 value=$(cat /proc/file)

  I wouldn't call that a perfectly adequate work-around, but a painful and
  unadequate work-around.

 For what it's worth, here's what bash does (based on strace):

 1. Determine whether the file is seekable.  That is, seek using
 SEEK_CUR with offset 0.

 2. If seekable, read a nice big chunk and then seek back to put the
 file offset back in the right place.  If not seekable, read one byte
 at a time.

 This works in /proc because files in /proc are seekable.

 That said, I don't think borked /proc is a great reason to do this
 (it's a better reason to fix /proc).  Speeding up the read builtin
 might be a good reason.

The optimization is of limited benefit (still way more syscalls than
strictly necessary) and does not apply to the common use case of reading
from a pipe. Generally, if 'read' is too slow, it is better to spend a
fork on a tool like grep, sed or awk which processes large amounts of
text much more efficiently.

As for value=$(cat /proc/file), there are at least two ways to make this
faster. The traditional ksh way is the extension value=$(/proc/file)
which is permitted but not required by POSIX. I do not really like this
as it makes the scripts not POSIX compliant. In recent mksh I noticed
another way: by making cat(1) a builtin under certain circumstances.
These circumstances include the absence of options (other than --) and
should probably also exclude foreground commands in interactive job
control shells (as builtins cannot be suspended). To avoid needing to
implement extensions like FreeBSD cat's ability to read from Unix domain
sockets, named files could also be excluded, requiring value=$(cat
/proc/file).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [REDIR] Replace GPL noclobberopen code with the FreeBSD version.

2010-11-20 Thread Jilles Tjoelker
Replace noclobberopen() from bash with the FreeBSD code for noclobber
opens.

This also reduces code size by eliminating an unnecessary check.
---
 src/redir.c |   77 +++---
 1 files changed, 15 insertions(+), 62 deletions(-)

diff --git a/src/redir.c b/src/redir.c
index b4e49c0..f96a76b 100644
--- a/src/redir.c
+++ b/src/redir.c
@@ -84,7 +84,6 @@ STATIC void dupredirect(union node *, int, char[10]);
 STATIC void dupredirect(union node *, int);
 #endif
 STATIC int openhere(union node *);
-STATIC int noclobberopen(const char *);
 
 
 /*
@@ -168,6 +167,7 @@ redirect(union node *redir, int flags)
 STATIC int
 openredirect(union node *redir)
 {
+   struct stat64 sb;
char *fname;
int f;
 
@@ -186,8 +186,21 @@ openredirect(union node *redir)
/* Take care of noclobber mode. */
if (Cflag) {
fname = redir-nfile.expfname;
-   if ((f = noclobberopen(fname))  0)
+   if (stat64(fname, sb)  0) {
+   if ((f = open64(fname, O_WRONLY|O_CREAT|O_EXCL, 
0666))  0)
+   goto ecreate;
+   } else if (!S_ISREG(sb.st_mode)) {
+   if ((f = open64(fname, O_WRONLY, 0666))  0)
+   goto ecreate;
+   if (fstat64(f, sb)  0  S_ISREG(sb.st_mode)) 
{
+   close(f);
+   errno = EEXIST;
+   goto ecreate;
+   }
+   } else {
+   errno = EEXIST;
goto ecreate;
+   }
break;
}
/* FALLTHROUGH */
@@ -397,66 +410,6 @@ savefd(int from, int ofd)
 }
 
 
-/*
- * Open a file in noclobber mode.
- * The code was copied from bash.
- */
-int
-noclobberopen(fname)
-   const char *fname;
-{
-   int r, fd;
-   struct stat64 finfo, finfo2;
-
-   /*
-* If the file exists and is a regular file, return an error
-* immediately.
-*/
-   r = stat64(fname, finfo);
-   if (r == 0  S_ISREG(finfo.st_mode)) {
-   errno = EEXIST;
-   return -1;
-   }
-
-   /*
-* If the file was not present (r != 0), make sure we open it
-* exclusively so that if it is created before we open it, our open
-* will fail.  Make sure that we do not truncate an existing file.
-* Note that we don't turn on O_EXCL unless the stat failed -- if the
-* file was not a regular file, we leave O_EXCL off.
-*/
-   if (r != 0)
-   return open64(fname, O_WRONLY|O_CREAT|O_EXCL, 0666);
-   fd = open64(fname, O_WRONLY|O_CREAT, 0666);
-
-   /* If the open failed, return the file descriptor right away. */
-   if (fd  0)
-   return fd;
-
-   /*
-* OK, the open succeeded, but the file may have been changed from a
-* non-regular file to a regular file between the stat and the open.
-* We are assuming that the O_EXCL open handles the case where FILENAME
-* did not exist and is symlinked to an existing file between the stat
-* and open.
-*/
-
-   /*
-* If we can open it and fstat the file descriptor, and neither check
-* revealed that it was a regular file, and the file has not been
-* replaced, return the file descriptor.
-*/
-if (fstat64(fd, finfo2) == 0  !S_ISREG(finfo2.st_mode) 
-finfo.st_dev == finfo2.st_dev  finfo.st_ino == finfo2.st_ino)
-   return fd;
-
-   /* The file has been replaced.  badness. */
-   close(fd);
-   errno = EEXIST;
-   return -1;
-}
-
-
 int
 redirectsafe(union node *redir, int flags)
 {
-- 
1.7.3.2

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quoted closing brace in variable default expansion

2010-11-13 Thread Jilles Tjoelker
On Sat, Nov 13, 2010 at 10:41:47AM -0600, Jonathan Nieder wrote:
 Harald van Dijk wrote:

$ ksh -c 'echo ${x:-}}'
}
$ dash -c 'echo ${x:-}}'
dash: 1: Syntax error: Unterminated quoted string
$ busybox sh -c 'echo ${x:-}}'
sh: syntax error: unterminated quoted string

  It looks like dash and other ash derivatives stop the expansion with
  the first }, instead of the first unquoted }. I'm getting confused
  trying to figure out whether this is a bug in dash or in the script
  relying on it.

 The answer depends on how portable the script is meant to be.  If
 the goal is to be portable to shells implementing future versions of
 the POSIX standard, there seems to be have been an interpretation[1]
 approved for the next major revision:

  http://austingroupbugs.net/view.php?id=221
  note #399

 which would make the example nonconformant because there are an odd
 number of unescaped double-quotes before the first unescaped closing
 brace.

Correct.

 See http://thread.gmane.org/gmane.comp.shells.dash/262/focus=263 for
 a nice summary (thanks, Jilles!).

Note that I have changed FreeBSD 9 sh since I wrote that. To preserve
sanity in expand.c, a substitution now ends with the same quoting level
as it started. Closing braces that do not conform to this are taken
literally. Making the parole configure script do what the author
expected was a bonus.

The change was needed to implement word splitting in WORD in ${v+WORD}.
For that, expand.c needs to know the quoted state of any character
of a word in ${v+-word}. Like NetBSD sh and dash have done, I added a
new magic character CTLQUOTEEND that shows the end of the quoted part
and interpreted CTLQUOTEMARK as the start. If I now have something like
  ${v-abc${vv-defghi}jkl}
I would have to pass through the fact that the quotes were opened in
${vv-defghi} to the outer substitution, and I did not want that. The
change avoids it.

FreeBSD 9 sh is now very close to ksh93 in treatment of POSIX
substitutions and I am happy with this. (One of the deliberate
differences is that the above incorrect construct is flagged as an error
(missing '}') while ksh93 treats it as an incomplete command.)

As for the portable alternative, split up the command. Either put the
'}'-containing thing in a variable and use that as alternative or use an
explicit conditional. The Autoconf documentation has more information,
for example.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug with - under ulimit -n

2010-10-27 Thread Jilles Tjoelker
On Wed, Oct 27, 2010 at 03:33:41PM -0600, Eric Blake wrote:
 Dash does not behave well when under artificial fd pressure due to
 ulimit -n.  It insists on copying a to-be-closed fd to another fd
 greater than 10, then complains when the dup fails, rather than just
 flat-out closing the fd in the first place.  Compare this with ksh93

 $ ksh -c 'ulimit -n 10; : -'; echo $?
 0
 $ dash -c 'ulimit -n 11; : -'; echo $?
 0
 $ dash -c 'ulimit -n 10; : -'; echo $?
 dash: 0: Invalid argument
 2

 See this thread on the bug-tar list for more details:
 http://thread.gmane.org/gmane.comp.gnu.tar.bugs/4010/focus=4020

Summary: use 'exec' to close the file descriptors before setting ulimit
-n ridiculously low instead of relying on redirections to work in such a
restricted environment.

By the way, there is also a bug in some ash versions that fails when
closing an fd (via - or -) that is not open, except if the
redirection is for 'exec' or an external program.

However, some of the gory details are below.

fcntl(F_DUPFD) fails with [EINVAL] if arg (the minimum for the new file
descriptor) exceeds the ulimit. The FreeBSD kernel, at least, checks
this before checking if the existing file descriptor is open. Therefore
any redirection attempt that must allow for the possibility of restoring
the old open file requires a ulimit -n of at least 11.

That the ksh93 example works is due to ksh93 being very clever (too
clever for its own good, sometimes, if you ask me). It thinks that it
has no need to save the old open file if this is the last command and
there are no trap handlers.

So the following fail (on a FreeBSD 8 kernel):

$ ksh93 -c 'ulimit -n 10; : -; :'; echo $?
ksh93[1]: open file limit exceeded [Invalid argument]
1
$ ksh93 -c 'trap echo hi EXIT; ulimit -n 10; : -'; echo $?
ksh93[1]: open file limit exceeded [Invalid argument]
hi
1

Where the cleverness fails is when that last command sets a trap handler
itself:

$ ksh93 -c ':; trap echo hi EXIT /dev/null'
$

The other shells I tried correctly print hi.

A further difference between shells is that some shells perform
redirections for a simple command before looking at what kind of command
it is, therefore always saving the old open files, while others postpone
this until after looking what kind of command it is, not saving the old
open files for external programs because the open happens in the child
process (potentially also for 'exec').

The former behaviour is shown by most kinds of Korn shell and dash. The
latter behaviour is shown by the Bourne shell, most other ash
derivatives, bash and zsh.

A command like
  (ulimit -n 10; /bin/foo 3- 4-; :)
will only work in a shell that uses the latter behaviour.

I consider both behaviours valid and shell script that depends on either
of them broken. They both have historical basis and their own advantages
and disadvantages.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [BUILTIN] Fix various issues with read by importing the NetBSD/FreeBSD code.

2010-09-07 Thread Jilles Tjoelker
This reverts the change to make read use the code from expand.c.
Although read's splitting is similar to the splitting done as part of
word expansions, the differences appear to add more complexity than the
common code saves.

The new code is from FreeBSD (which was originally taken from NetBSD).
The -e and -t options have been left out. Some special handling for
EINTR which is not necessary in FreeBSD sh has been retained from the
older dash code.

We now pass FreeBSD's builtins/read1.0 test, posh's mksh.t:read-IFS-1
test and gsf's ifs.sh.
---
 src/miscbltin.c |  170 ++-
 1 files changed, 92 insertions(+), 78 deletions(-)

diff --git a/src/miscbltin.c b/src/miscbltin.c
index 5ab1648..fc6e8d4 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -62,69 +62,22 @@
 #undef rflag
 
 
-/** handle one line of the read command.
- *  more fields than variables - remainder shall be part of last variable.
- *  less fields than variables - remaining variables unset.
- *
- *  @param line complete line of input
- *  @param ap argument (variable) list
- *  @param len length of line including trailing '\0'
- */
-static void
-readcmd_handle_line(char *line, char **ap, size_t len)
-{
-   struct arglist arglist;
-   struct strlist *sl;
-   char *s, *backup;
-
-   /* ifsbreakup will fiddle with stack region... */
-   s = grabstackstr(line + len);
-
-   /* need a copy, so that delimiters aren't lost
-* in case there are more fields than variables */
-   backup = sstrdup(line);
-
-   arglist.lastp = arglist.list;
-   recordregion(0, len - 1, 0);
-   
-   ifsbreakup(s, arglist);
-   *arglist.lastp = NULL;
-   removerecordregions(0);
-
-   for (sl = arglist.list; sl; sl = sl-next) {
-   /* remaining fields present, but no variables left. */
-   if (!ap[1]) {
-   size_t offset;
-   char *remainder;
-
-   /* FIXME little bit hacky, assuming that ifsbreakup 
-* will not modify the length of the string */
-   offset = sl-text - s;
-   remainder = backup + offset;
-   rmescapes(remainder);
-   setvar(*ap, remainder, 0);
-
-   return;
-   }
-   
-   /* set variable to field */
-   rmescapes(sl-text);
-   setvar(*ap, sl-text, 0);
-   ap++;
-   }
-
-   /* nullify remaining arguments */
-   do {
-   setvar(*ap, nullstr, 0);
-   } while (*++ap);
-}
-
 /*
- * The read builtin.  The -e option causes backslashes to escape the
- * following character. The -p option followed by an argument prompts
+ * The read builtin.  The -r option causes backslashes to be treated like
+ * ordinary characters. The -p option followed by an argument prompts
  * with the argument.
  *
  * This uses unbuffered input, which may be avoidable in some cases.
+ *
+ * Note that if IFS=' :' then read x y should work so that:
+ * 'a b'   x='a', y='b'
+ * ' a b ' x='a', y='b'
+ * ':b'x='',  y='b'
+ * ':' x='',  y=''
+ * '::'x='',  y=''
+ * ': :'   x='',  y=''
+ * ':::'   x='',  y='::'
+ * ':b c:' x='',  y='b c:'
  */
 
 int
@@ -135,9 +88,14 @@ readcmd(int argc, char **argv)
char c;
int rflag;
char *prompt;
+   const char *ifs;
char *p;
+   int startword;
int status;
int i;
+   int is_ifs;
+   int saveall = 0;
+   ssize_t n;
 
rflag = 0;
prompt = NULL;
@@ -155,28 +113,28 @@ readcmd(int argc, char **argv)
}
if (*(ap = argptr) == NULL)
sh_error(arg count);
+   if ((ifs = bltinlookup(IFS)) == NULL)
+   ifs =  \t\n;
+
status = 0;
+   startword = 2;
backslash = 0;
STARTSTACKSTR(p);
for (;;) {
-   switch (read(0, c, 1)) {
-   case 1:
-   break;
-   default:
-   if (errno == EINTR  !pendingsigs)
-   continue;
-   /* fall through */
-   case 0:
+   do
+   n = read(STDIN_FILENO, c, 1);
+   while (n == -1  errno == EINTR  !pendingsigs);
+   if (n != 1) {
status = 1;
-   goto out;
+   break;
}
if (c == '\0')
continue;
if (backslash) {
-   if (c == '\n')
-   goto resetbs;
-   STPUTC(CTLESC, p);
-   goto put;
+   backslash = 0;
+   if (c != '\n')
+   STPUTC(c, 

Re: [PATCH] fix UTF-8 issues in read() builtin

2010-09-07 Thread Jilles Tjoelker
On Wed, Sep 08, 2010 at 01:26:15AM +0400, Alexey Zinovyev wrote:
 Hello, I think there is a bug in read() builtin.

 $ cat test
 echo 'ρ'|while read i; do echo $i; done
 $ dash test

 $ bash test
 ρ

 Same with some japanese symbols.
 Looks like dash strips 0x81 byte. 

0x81 == CTLESC, the escape character in dash's internal representation.

 diff --git a/src/miscbltin.c b/src/miscbltin.c
 index 5ab1648..f8c5655 100644
 --- a/src/miscbltin.c
 +++ b/src/miscbltin.c
 @@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t len)
* will not modify the length of the string */
   offset = sl-text - s;
   remainder = backup + offset;
 - rmescapes(remainder);
   setvar(*ap, remainder, 0);
  
   return;

This patch is not correct as it will leave 0x81 bytes for backslash
escapes. That is probably a bit worse than ignoring the backslashes
entirely, which is what it does now. It attempts to escape the next
character by placing a CTLESC, but CTLESC does not and should not escape
IFS characters for ifsbreakup(); the recordregion() mechanism should be
used for that.

(For the intermediate representation generated by parser.c, CTLESC does
escape IFS characters. This is not ideal as it prevents IFS splitting
with CTL* bytes in word in ${var+-word}.)

The patch I posted separately fixes the handling of 0x81 and various
other issues with read (by using separate code instead of trying to use
expand.c). Backslash escaping works too although I have just found some
bugs with corner cases.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-04 Thread Jilles Tjoelker
On Sat, Sep 04, 2010 at 08:20:33PM +0200, Steve Schnepp wrote:
 2010/9/3 Jilles Tjoelker jil...@stack.nl:
  This patch assumes that the file descriptor is discarded afterwards (its
  position does not matter). Therefore the very common construct
   while read x; do
     ...
   done
  stops working.

 Ohh.. thanks for that, I didn't see it.

 Actually while read x continues to work.
 But reopening the file doesn't as in :

 read a b  datafile
 echo ${a} ${b}
 read a b  datafile
 echo ${a} ${b}

You're right, it's even stranger than I expected.

 I attached an updated patch that corrects this pb by discarding the
 buffer when opening a new file.

This discarding is still bad as it throws away valid data if the open
file description is shared. This happens if stdin is redirected inside a
while read... loop.

Furthermore, I think constructions like
  read x; cat
and
  read x; (read y); read z
should keep working. This requires that the input's file position be
synced whenever another process may see it (fork/exit). Due to the
highly dynamic character of the shell and the common use of fd 0, this
probably means that you can't do better than syncing after each read
builtin. (For example, 'read' could be overridden with a function after
the third line.)

Another thought:
  exec 30; read x; read y 3
or even
  sh -c 'read x; read y 3' 30
Different file descriptors may refer to the same open file description
and the shell may not know this.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-24 Thread Jilles Tjoelker
On Tue, Aug 24, 2010 at 12:51:47AM +0200, Harald van Dijk wrote:
 On 23/08/10 21:35, Jilles Tjoelker wrote:
  I think you should do what you think is best for the stability of your
  product. Because dash releases are not extensively tested, I'd recommend
  a trial build of at least a minimal base system with the new version you
  choose. A particular feature to be wary of is LINENO support, as it will
  cause most configure scripts to accept dash as a usable shell.

 Thanks, I'm aware of that. I already locally exported CONFIG_SHELL, so
 that even without LINENO support the configure scripts were already run
 from dash.

 That reminds me: the LINENO support is useful, but the tracking of line
 numbers has some issues:

 $ src/dash -c 'f() { echo $LINENO; }
 f
 f
 '
 2
 3

 But this is not new, and not limited to LINENO:

 $ cat foo.sh
 if :; then
 foo
 :
 :
 :
 :
 :
 fi
 $ src/dash foo.sh
 foo.sh: 8: foo: not found
 $ bash foo.sh
 foo.sh: line 2: foo: command not found

 I have a patch that improves this by storing the line numbers in the
 command nodes, if you're interested, but it needs polishing before I
 plan on sending it here or anywhere, and there are probably some corner
 cases that it mishandles.

Yes, I think that's the proper way to implement LINENO.

FreeBSD sh avoids extending the nodes by detecting expansions of LINENO
at parse time and storing the line number at that time. However, this is
only possible because it does not print a line number when there is an
error in a builtin.

 [IFS=, ]
  I think the important thing is that IFS characters are supposed to be
  field terminators (see POSIX XCU 2.6.5 Field Splitting).

  Therefore, in the example  1 ,2 3, there are three fields, each
  containing one digit, and each variable is assigned one of them.

 The more I read it, the more I'm actually becoming convinced that zsh is
 doing the right thing, and dash is almost doing the right thing.

 2.6.5 uses the term delimiter, not terminator. They don't mean the
 same thing. A delimiter can mark the start of a field as well as the
 end. And if you compare susv2 with susv3, you may see susv2 is a lot
 clearer than v3 on one point, because it ends the Field Splitting
 section with a note.

 The last rule can be summarised as a pseudo-ERE:

 (s*ns*|s+)

  where s is an white-space character and n is a character in the
  that is not white space. Any string matching that ERE delimits a
  field, except that the s+ form does not delimit fields at the
  beginning or the end of a line. (followed by an example)

 This says the s+ form does not delimit fields at the end of a line,
 which strongly implies that the s*ns* form does. The wording is wrong no
 matter how you look at it (splitting a   results in one field a, not
 one field a  ), and the note has been removed in susv3. Still, it
 manages to somewhat clarify the rest of text.

POSIX.1-2008 (aka SUSv4) says, at one point, that the shell shall use
the delimiters as field terminators.

The specification of field splitting did indeed change at some point, so
that a final non-whitespace IFS character does not have a final empty
field after it. Likely, the old spec was not what was intended as the
System V sh behaved much like the new spec (though not exactly).
Generally, the POSIX shell command language is designed to match the
Bourne shell (as in System V) and ksh88, and deviations from this are
mentioned in the rationale. Note that this does not mean that either the
Bourne shell or ksh88 are compliant.

  In the example  1 ,2 3,, there are four fields, the last being empty.
  Then c is assigned the third field plus the delimiter character and the
  remaining fields and their delimiters except trailing whitespace that is
  in IFS. Hence, both commas end up in c.

 The read command description states:

 If there are fewer var operands specified than there are fields, the
  leftover fields and their intervening separators shall be assigned to
  the last var.

 If  1 ,2 3,, forms four fields, where the fourth field is the empty
 string between the two trailing commas, then the final comma is not an
 intervening separator, so it should be excluded from c.

The description of the read utility is also different in POSIX.1-2008.
It does not talk about intervening separators but about delimiters
in general.

The intention is that if there are more fields than variables, the final
variable receive the exact text after the already assigned fields and
their delimiters (apart from trailing IFS whitespace). The POSIX.1-2008
text achieves this if used with the POSIX.1-2008 field splitting rules,
and so does the text you cited if used with the old field splitting
rules (which result in five fields for  1 ,2 3,,).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trap bug in recent versions of dash

2010-08-22 Thread Jilles Tjoelker
On Mon, Aug 16, 2010 at 01:52:56PM +0200, Guido Berhoerster wrote:
 * Jilles Tjoelker jil...@stack.nl [2010-08-15 22:05]:
  On Wed, Aug 11, 2010 at 10:06:16AM +0200, Guido Berhoerster wrote:
   with the latest git version of dash trap actions are not
   evaluated in the context of a function.

  I think dotrap()'s return value should be removed or at least ignored.
  An evalskip (break/continue/return-function/return-file) should never
  lead to an immediate exit. I'm not supplying a patch because I am not
  entirely sure about the side effects this could have.

 OK, I'm not familiar with the source but it'd be nice to have it
 fixed before the next release since it is a regression breaking
 scripts.

If you want to try something, here is a patch. I have verified that the
only change to the results of FreeBSD sh's testsuite is that the test
builtins/break2.0 starts working (there are still 51 other broken
tests). There is no change in output from the posh testsuite (run with
  -C sh,posix,no-typeset,no-arrays,no-coprocs,no-herestrings,no-history
).

diff --git a/src/eval.c b/src/eval.c
index d5e5c95..e484bec 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -307,9 +307,9 @@ setstatus:
break;
}
 out:
-   if ((checkexit  exitstatus) ||
-   (pendingsigs  dotrap()) ||
-   (flags  EV_EXIT))
+   if (pendingsigs)
+   dotrap();
+   if ((flags  EV_EXIT) || (checkexit  exitstatus))
exraise(EXEXIT);
 }

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-22 Thread Jilles Tjoelker
On Mon, Aug 23, 2010 at 12:20:12AM +0200, Harald van Dijk wrote:
 Hi, as has been reported already dash currently has a bug where the
 read builtin ignores the read environment's IFS setting. As a result,

   echo a:b | { IFS=: read a b; echo $a; }

 will write out a:b. I tried to see what changed between 0.5.5.1 and
 0.5.6, and found that the old code used bltinlookup(IFS). So, I made
 the new code also use that. The above example works properly with the
 attached patch.

This has already been fixed in a totally different way in master. See
git commits near 95a60b2936e8835963bfb08eadc0edf9dddf0498.

 However, testing further, I found that there is bad interaction (even
 without my patch, where the IFS assignment below should just be
 ignored) between the code that handles variable assignments, and read.
 Try this:

$ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; echo 
 |$a|$b|$c|; }'
|a||b|
$ src/dash -c 'printf a\t\tb\n | { IFS=$(printf \t) read a b c; echo 
 |$a|$b|$c|; }'
|a|b||

 This happens because expbackq is correctly called without EXP_QUOTED,
 which makes it call recordregion, which isn't cleared by the time read
 calls ifsbreakup. I'm not sure how that should be solved, and if there
 are more cases where it would fail. The simplest way to solve this for
 read is to call removerecordregions(0) before recordregion(0, len - 1,
 0). I have tested that locally, and it works. I am not familiar enough
 with the code to judge whether the same situation can also happen in
 other cases that would also need fixing, which is why I have not
 included that part in the attached patch.

Ick. Your change will probably work, but it remains sneaky action at a
distance. To reduce complexity, it would be good to implement read's
splitting without using expand.c. I estimate the extra lines of code
from importing FreeBSD's code at less than 50. It will also fix an edge
case with the splitting. The last two lines of FreeBSD's
builtins/read1.0 test are:

echo  1 ,2 3, | { IFS=', ' read a b c; echo x${a}x${b}x${c}x; }
echo  1 ,2 3,,| { IFS=', ' read a b c; echo x${a}x${b}x${c}x; }

These should result in:

x1x2x3x
x1x2x3,,x

bash and ksh93 agree on this. However, dash master prints:

x1x2x3,x
x1x2x3,,x

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trap bug in recent versions of dash

2010-08-15 Thread Jilles Tjoelker
On Wed, Aug 11, 2010 at 10:06:16AM +0200, Guido Berhoerster wrote:
 with the latest git version of dash trap actions are not
 evaluated in the context of a function.

I think dotrap()'s return value should be removed or at least ignored.
An evalskip (break/continue/return-function/return-file) should never
lead to an immediate exit. I'm not supplying a patch because I am not
entirely sure about the side effects this could have.

Related, there may be a bug with SKIPFILE. This constant is never tested
against, and this makes 'break' inside a dot script behave strangely.
Example (this must be saved to a file, as it sources itself):

if [ $1 != nested ]; then
while :; do
set -- nested
. $0
echo bad2
exit 2
done
exit 0
fi
break
echo bad1
exit 1

 The following script demonstrates the bug:
 8
 read_timeout () {
 saved_traps=$(trap)

This does not work in dash, it always returns an empty string because
trap is evaluated in a subshell.

Some other ash variants do not evaluate most command substitutions
containing only a single builtin in a subshell, but this was removed in
NetBSD sh and dash because such commands can affect the shell in wrong
ways (e.g. $(exit 1) causes FreeBSD sh to exit).

A recent or proposed POSIX interpretation has said that $(trap) should
work, and that this may be done by treating a lone trap command
specially or by having trap in a subshell output the parent's traps
until a trap has been set in the subshell. To help the former case,
stuff like TRAP=trap; y=$($TRAP) is not required to work.

A problem in the script is that it does not handle TERM set to the
default action properly, as this is not included in trap's output.

 trap 'printf timed out\n; eval ${saved_traps}; return' TERM
 ( sleep $1; kill -TERM $$ ) /dev/null 21 

For portability, I recommend braces
{ sleep $1; kill -TERM $$; } /dev/null 21 

Some shells do not treat a background subshell specially and fork twice,
which would cause the wrong process to be killed below.

 timer_pid=$!
 read $2
 kill $timer_pid 2/dev/null

eval ${saved_traps}  is missing here.

 }

 read_timeout 5 value
 printf read \%s\\n ${value:=default}

 8
 The return statement in the trap inside the read_timeout function
 does not return from the function but rather exits the script.

 With dash 0.5.5.1 it works as expected.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


debian patches to exit with code 127 for nonexistent/directory scripts

2010-06-05 Thread Jilles Tjoelker
Debian's dash package has some local changes which cause an exit with
code 127, as required by POSIX, if a script (passed with dash
filename) cannot be opened or cannot be read because it is a
directory.

Unfortunately, these patches also affect the . builtin (if the pathname
contains a slash) and use EXEXIT, which means such errors always cause
the shell to exit, even in interactive mode or if the builtin's
specialness has been disabled using command.

% dash
$ . ./nonexistent
.: 1: Can't open ./nonexistent
zsh: exit 127   dash
%

% dash -c 'command . ./nonexistent; echo continued'

Note: Do not compare this with bash. Bash deliberately does not follow
POSIX XCU 2.8.1 Consequences of Shell Errors if not in POSIX mode, and
even in POSIX mode trying to source a nonexistent dot script (without
slash in the pathname) fails to abort the shell.

Note 2: POSIX seems unclear about what 'command .' should do, but is
very clear that failure to find/read a dot script shall not cause an
interactive shell to exit.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [VAR] Add localvars nesting

2010-05-26 Thread Jilles Tjoelker
On Wed, May 26, 2010 at 09:00:31PM +1000, Herbert Xu wrote:
 [VAR] Add localvars nesting

 This patch adds localvars nesting infrastructure so we can reuse
 the localvars mechanism for command evaluation.

localvars could already nest, it's just that this change makes the
nesting explicit instead of on the C stack.

Further comments inline.

 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
 ---

  ChangeLog  |4 
  src/eval.c |7 ++-
  src/var.c  |   50 --
  src/var.h  |1 +
  4 files changed, 51 insertions(+), 11 deletions(-)

 diff --git a/ChangeLog b/ChangeLog
 index ee78154..7285a23 100644
 --- a/ChangeLog
 +++ b/ChangeLog
 @@ -1,3 +1,7 @@
 +2010-05-24  Herbert Xu herb...@gondor.apana.org.au
 +
 + * Add localvars nesting.
 +
  2010-05-03  Gerrit Pape p...@smarden.org
  
   * Fix command -- crash.
 diff --git a/src/eval.c b/src/eval.c
 index 62d9d5d..8d2767c 100644
 --- a/src/eval.c
 +++ b/src/eval.c
 @@ -928,20 +928,17 @@ STATIC int
  evalfun(struct funcnode *func, int argc, char **argv, int flags)
  {
   volatile struct shparam saveparam;
 - struct localvar *volatile savelocalvars;
   struct jmploc *volatile savehandler;
   struct jmploc jmploc;
   int e;
  
   saveparam = shellparam;
 - savelocalvars = localvars;
   if ((e = setjmp(jmploc.loc))) {
   goto funcdone;
   }
   INTOFF;
   savehandler = handler;
   handler = jmploc;
 - localvars = NULL;
   shellparam.malloc = 0;
   func-count++;
   funcnest++;
 @@ -950,13 +947,13 @@ evalfun(struct funcnode *func, int argc, char **argv, 
 int flags)
   shellparam.p = argv + 1;
   shellparam.optind = 1;
   shellparam.optoff = -1;
 + pushlocalvars();
   evaltree(func-n, flags  EV_TESTED);
 + poplocalvars();
  funcdone:
   INTOFF;
   funcnest--;
   freefunc(func);
 - poplocalvars();
 - localvars = savelocalvars;
   freeparam(shellparam);
   shellparam = saveparam;
   handler = savehandler;

This change does not do poplocalvars() if the function was exited with
an error. While this is normally not a problem because the shell exits
or executes the RESET actions, 'command eval', 'command .' and 'fc' will
catch any errors and continue normally.

Example (with the below disallow local outside a function change):
  dash -c 'f() { unset xyz; ${xyz?}; }; command eval f; local i'

 diff --git a/src/var.c b/src/var.c
 index 2737fb1..de1a5f5 100644
 [...]
 @@ -446,6 +456,9 @@ localcmd(int argc, char **argv)
  {
   char *name;
  
 + if (!localvar_stack)
 + sh_error(not in a function);
 +
   argv = argptr;
   while ((name = *argv++) != NULL) {
   mklocal(name);

This change (failing local outside functions), while a good idea, should
be mentioned in the commit message/changelog, as it might break certain
scripts.

(Note that in mksh local is an alias for typeset, which will work
outside functions.)

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] [VAR] Replace cmdenviron with localvars

2010-05-26 Thread Jilles Tjoelker
On Wed, May 26, 2010 at 09:00:34PM +1000, Herbert Xu wrote:
 [VAR] Replace cmdenviron with localvars

 This patch replaces the cmdenviron mechanism for temporary command
 variables with the localvars mechanism used by functions.

 This reduces code size, and more importantly, makes the variable
 assignment take effect immediately as required by POSIX.

This change breaks passing variable assignments to regular builtins.
For example,
  dash -c 'HOME=/ cd; pwd'
This should print /. (For some reason, this does not work in ksh93
either.)
Or
  dash -c 'cd /binCDPATH=/ cd bin; pwd'
This should print /bin twice and not fail any command. (Again this fails
in ksh93.)

A more common example,
  IFS=: read -r x y z
is already broken in master before this change (due to git commit
55c46b7286f5d9f2d8291158203e2b61d2494420 which effectively replaced
bltinlookup(IFS) with ifsval()).

Similar issues may pop up with
  PATH=/foo command -V bar
  PATH=/foo type bar
  LC_NUMERIC=de_DE.UTF-8 printf '%f\n' 1,2

(Note: the latter is broken in bash, but in ksh93 and zsh it works, as
well as with an external printf(1) program.)

'local' may need additional magic to avoid making things local to the
temporary regular-builtin scope. I suppose only functions should induce
scope for 'local', such that things like command eval 'local i' continue
to create a variable local to the function.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wait regression in 3800d4

2010-05-22 Thread Jilles Tjoelker
On Wed, May 19, 2010 at 07:08:56PM +, Gerrit Pape wrote:
 Hi, since commit 3800d49 the wait builtin shows some unexpected
 behavior, see http://bugs.debian.org/581425

 To reproduce:

  $ dash -c '
  for i in 1 2 3; do
(sleep $i; echo $i) 
  done
  wait
  echo all done
  sleep 2'
  1
  all done
  2
  3
  $ 

 Expected output:
  1
  2
  3
  all done

 I don't completely understand commitdiff 3800d49, and didn't find time
 to fix the lack of understanding yet, but this patch seems to fix this
 specific issue.

 diff --git a/src/jobs.c b/src/jobs.c
 index a4fada0..57c0854 100644
 --- a/src/jobs.c
 +++ b/src/jobs.c
 @@ -1136,8 +1136,6 @@ waitproc(int block, int *status)
 if (err || !block)
 break;
  
 -   block = 0;
 -
 sigfillset(mask);
 sigprocmask(SIG_SETMASK, mask, oldmask);
  

This patch is not correct as it makes dash busy-wait for the second and
further processes to terminate. Also, I think aborting the wait builtin
with 128+SIGCHLD is the correct behaviour if a trap on CHLD is active
(like any other trap). This requires a more sophisticated handling of
pendingsigs so that an untrapped SIGCHLD does not overwrite a trapped
signal.

I wonder why this hasn't been noticed before, as this is the most common
use of the wait builtin. I can even reproduce the regression with the
test script in 3800d49's commit message: sometimes I see a line *** 148
meaning that the wait builtin was aborted by SIGCHLD (even though no
CHLD trap has been set); however only on an SMP system. (This is on
FreeBSD; I hacked mksignames.c to compile and work, although realtime
signals are missing.)

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [EVAL] Force fork if any trap is set, not just on EXIT.

2010-05-14 Thread Jilles Tjoelker
In some cases the shell executes a subshell or an external command in
the current process. This is not done if a trap on EXIT has been set, so
that that trap can execute after the subshell or external command has
finished. Extend that check to all traps. (A trap is set if a
non-empty command string has been attached to it.)

Improve encapsulation by exporting an accessor function for this and
making the trap array static again.

This is much like FreeBSD SVN r194127, enhanced to apply to subshells
also (see FreeBSD SVN r194774).

Example:
  dash -c '{ trap echo moo TERM; sleep 3; } sleep 1; kill $!;wait'
This should print moo after 3 seconds.

Example:
  dash -c '{ trap echo moo TERM; (sleep 3) } sleep 1; kill $!;wait'
The same.

Example:
  dash -c '{ trap echo moo TERM; sleep 3; :; } sleep 1; kill $!;wait'
This works correctly even without this patch.

Signed-off-by: Jilles Tjoelker jil...@stack.nl
---
 src/eval.c |4 ++--
 src/trap.c |   16 +++-
 src/trap.h |2 +-
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/eval.c b/src/eval.c
index 62d9d5d..e7c95cc 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -448,7 +448,7 @@ evalsubshell(union node *n, int flags)
int status;
 
expredir(n-nredir.redirect);
-   if (!backgnd  flags  EV_EXIT  !trap[0])
+   if (!backgnd  flags  EV_EXIT  !have_traps())
goto nofork;
INTOFF;
jp = makejob(n, 1);
@@ -829,7 +829,7 @@ bail:
switch (cmdentry.cmdtype) {
default:
/* Fork off a child process if necessary. */
-   if (!(flags  EV_EXIT) || trap[0]) {
+   if (!(flags  EV_EXIT) || have_traps()) {
INTOFF;
jp = makejob(cmd, 1);
if (forkshell(jp, cmd, FORK_FG) != 0) {
diff --git a/src/trap.c b/src/trap.c
index 3f93c46..439b8d2 100644
--- a/src/trap.c
+++ b/src/trap.c
@@ -69,7 +69,7 @@
 
 
 /* trap handler commands */
-char *trap[NSIG];
+static char *trap[NSIG];
 /* current value of signal */
 char sigmode[NSIG - 1];
 /* indicates specified signal received */
@@ -163,6 +163,20 @@ clear_traps(void)
 }
 
 
+/*
+ * Check if we have any traps enabled.
+ */
+int
+have_traps(void)
+{
+   char *volatile *tp;
+
+   for (tp = trap ; tp = trap[NSIG - 1] ; tp++) {
+   if (*tp  **tp)/* trap not NULL or SIG_IGN */
+   return 1;
+   }
+   return 0;
+}
 
 /*
  * Set the signal handler for the specified signal.  The routine figures
diff --git a/src/trap.h b/src/trap.h
index f19a66b..4e897a9 100644
--- a/src/trap.h
+++ b/src/trap.h
@@ -36,12 +36,12 @@
 
 #include signal.h
 
-extern char *trap[];
 extern char sigmode[];
 extern volatile sig_atomic_t pendingsigs;
 
 int trapcmd(int, char **);
 void clear_traps(void);
+int have_traps(void);
 void setsignal(int);
 void ignoresig(int);
 void onsig(int);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html