Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-24 Thread Chet Ramey

On 7/23/23 9:20 AM, Rob Landley wrote:


$ bash -c $'echo $LINENO\necho $(echo $LINENO\necho $LINENO\n); echo $LINENO'
0
3 4
3
$ bash -c $'echo $LINENO\necho $LINENO $(echo $LINENO\necho $LINENO\n); echo 
$LINENO'
0
1 1 2
3

Why does the 3 4 turn into 1 2? (Where does it get "3" from the
first time?)


This is how a bison parser works. It can't tell that a WORD starts a simple
command until it parses one more token (it could be a function definition,
for example). When it builds a simple command, it uses the current line
number as the line number that starts the simple command.

In the first case, the second word is not complete until line 3, when the
command substitution completes, so that is the line number of the simple
command. The command substitution, when it executes, inherits the line
number of the simple command, and adds 1 because of the embedded newline.

In the second case, the second word is $LINENO, so the current line number
is 1 (2 in current versions, since -c command initializes to 1). Everything
else works as before.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-23 Thread Rob Landley
On 6/30/23 19:57, Rob Landley wrote:
>> On 6/12/23 19:40, Chet Ramey wrote:
>>> I wish you were not so reluctant. Look at how many things you've discovered
>>> that I decided were bugs based on our discussions.
>>
>> But since you asked, today's new question I wrestled with was

$ bash -c $'echo $LINENO\necho $(echo $LINENO\necho $LINENO\n); echo $LINENO'
0
3 4
3
$ bash -c $'echo $LINENO\necho $LINENO $(echo $LINENO\necho $LINENO\n); echo 
$LINENO'
0
1 1 2
3

Why does the 3 4 turn into 1 2? (Where does it get "3" from the
first time?)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-11 Thread Chet Ramey

On 7/10/23 5:57 PM, enh wrote:





 > iirc we even _tried_ -1/ENOSYS to test the predictions that (a) most c
 > programmers don't check for failures (b) even those that do don't log
 > enough to be able to debug these problems

It penalizes those who do.


yeah, but we optimize for the common case.


I think we've determined that I am not a conventional android developer. :-)

again, if you're an _application_ things are very different --- it's a 
reasonable assumption that iOS and Android are your two main targets, and 
i'm sure there's plenty of stuff you can't do on iOS for security reasons 
too. you just don't see that because you wouldn't be able to run bash on 
iOS in the first place :-)


I've been told that iOS, being a macOS derivative, has bash down deep,
but I've never verified that (and have no real desire to).


Oh, come on, really? What's next? Adding checks in the str* functions
for NULL pointers? I am sure that catering to bad programming is not the
way to improve programming practices.


no, because there's no causal link with an earlier failure there. 


There is, if you ignore memory allocation failures. But I don't know how
prevalent that is.



but, again, that's aimed at apps.


Yep, bash is definitely a unicorn here.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-11 Thread Rob Landley
On 7/10/23 16:57, enh wrote:
> and i think that's the cross-purpose we're talking at here ... the 99% case is
> app code...
> the kind of collaborative command-line debugging over a mailing list that 
> you're
> familiar with doesn't exist for the TikToks of this world.

There's no amount of leaves a tree can sprout that mean it no longer needs a
trunk or roots so you can get rid of those.

> > i'm not a _fan_ of SIGSYS,
> > but i do still think it was the lesser of the available evils. (but of
> > course it's the folks who _do_ check for failures who're most likely to
> > disagree;
> 
> OK, I'm going to disagree.
> 
> because you have a completely different use case _and_ you're a very different
> kind of developer. there's no question it's not a good fit for you. (but 
> you're
> running in an app _context_ even if not in an app, so you get to enjoy "one 
> size
> fits all" :-( )

I remember the original java applet vs application divide.

I hope someday to have an application container context, with the capability to
securely perform tasks on phone hardware/OS that can currently only be performed
on PC hardware/OS. The ability to create a new system image and the ability to
install a new system image are distinct, like running as a normal user vs
running as root.

I'm also aware that getting from here to there is fraught, what with warhol
worms, evil maid attacks, state actors, and the device's nature as a 24/7
broadband connected GPS tracker with whole room microphone which some people
allow to perform payment processing.

But 30 years ago "there's a computer in every home, but we can't trust those,
you can only do systems programming by logging into at least a Vax"... is not
how history went. And would have been a BAD history, involving no teenagers
getting good at systems programming leading to no future systems programmers
outside of IBM etc. It would have prevented Linux, for one thing. (And the state
actors DO have the big iron: always did.)

> If AC_CHECK_FUNCS doesn't do the right thing when cross-compiling, maybe
> the autotools folks should hear about it?
> 
> i don't think autoconf is broken (given that _some_ stuff gets it right). but 
> a
> lot of library developers just don't take cross-compilation into account.
> (obviously massive selection bias here since i was an embedded developer 
> before
> working on Android, so i've mostly _only_ known cross-compilation, plus i
> obviously don't have to spend much time on the libraries that _do_ work.)

I've ranted about autoconf a lot over the years:

https://nondeterministic.computer/@land...@mstdn.jp/110181223774509774

https://git.busybox.net/busybox/commit/editors/sed.c?id=c06f568ddaaa

http://lists.landley.net/pipermail/aboriginal-landley.net/2011-June/000860.html

Not a fan. (Also, with autoconf/automake you need more dependency packages
installed to build the repository version than you do to build the release
tarball, which is always annoying.)

That said, it exists, gotta make it work...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-10 Thread enh via Toybox
On Sun, Jul 9, 2023 at 5:54 PM Chet Ramey  wrote:

> On 7/7/23 6:45 PM, enh wrote:
>
> >  > define "mess up"...
> >
> > Maybe "mess up" was too strong. I think it's rude to send a fatal
> signal if
> > you have the function. Either don't provide it or make it return
> -1/ENOSYS.
> > Android is by no means the only place this is a problem; it happens
> with
> > docker all the time:
> >
> > https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html
> > 
> >
> >
> > yeah, if it's any consolation we argued about this choice a lot. (which
> is
> > probably also why the choice exists in the first place!)
> >
> > iirc we even _tried_ -1/ENOSYS to test the predictions that (a) most c
> > programmers don't check for failures (b) even those that do don't log
> > enough to be able to debug these problems
>
> It penalizes those who do.
>

yeah, but we optimize for the common case.


> > and (c) even in cases where
> > errors are checked and logged, the ability of folks -- including _end
> > users_ -- to triage such problems is limited.
>
> Killing the process certainly doesn't improve that. How is it better for
> triage when the process dies unexpectedly with a fatal signal -- when
> running a builtin command -- that can't be reproduced anywhere else? If
> Grisha Levit hadn't recognized what was going on, I never would have even
> looked at it. I don't use Android as a development platform.
>

and i think that's the cross-purpose we're talking at here ... the 99% case
is app code, which actually means "JNI library dlopen()ed into a clone of
the zygote --- no main() at all". most app developers have no direct
contact with their users, and rely on clustered groups of
automatically-collected crashes. so a _crash_ is 100x more likely to
actually be something the developer sees. (as opposed to "i seem to be
having more users uninstall my app, but have no idea why and no way to find
out".)

the kind of collaborative command-line debugging over a mailing list that
you're familiar with doesn't exist for the TikToks of this world.


> > i'm not a _fan_ of SIGSYS,
> > but i do still think it was the lesser of the available evils. (but of
> > course it's the folks who _do_ check for failures who're most likely to
> > disagree;
>
> OK, I'm going to disagree.
>

because you have a completely different use case _and_ you're a very
different kind of developer. there's no question it's not a good fit for
you. (but you're running in an app _context_ even if not in an app, so you
get to enjoy "one size fits all" :-( )


> >  > Android deliberately has strict seccomp filters for
> >  > apps, and the syscalls mentioned in that post are on the "no"
> list.
> > Android
> >  > gives each _app_ a different uid, so there's typically nothing
> > useful you
> >  > can do here anyway. (things are a bit different if you're actually
> > part of
> >  > the OS, but bash being GPL makes that unlikely :-( )
> >
> > Sure. But if you have the function, bash assumes it works as
> documented in
> > the rest of the Linux world.
> >
> >
> > and so it does, if you're not an app.
> >
> > the tricky case here has always been the conflict between the folks who
> > want to check at build time versus those who want to check at runtime
> ...
> > if we hide things in the headers, people complain "i wasn't going to
> call
> > it unless i know i can anyway". (plus in addition to the general rule
> that
> > "most configure scripts don't handle cross-compilation correctly", you'd
> be
> > surprised how many screw up the simple "is this function available?"
> test
> > --- they don't compile a small program that #includes the right header
> and
> > tries to call the function; they grep the header or nm the library or
> > whatever.)
>
> If AC_CHECK_FUNCS doesn't do the right thing when cross-compiling, maybe
> the autotools folks should hear about it?
>

i don't think autoconf is broken (given that _some_ stuff gets it right).
but a lot of library developers just don't take cross-compilation into
account. (obviously massive selection bias here since i was an embedded
developer before working on Android, so i've mostly _only_ known
cross-compilation, plus i obviously don't have to spend much time on the
libraries that _do_ work.)


> >  > (yes, i agree that it's mildly unfortunate that there's no special
> > case for
> >  > "i don't actually want to change anything", which i think is the
> case
> >  > they're talking about in that post, and i've wondered about adding
> > that to
> >  > libc once or twice, but my feeling is that it wouldn't be
> particularly
> >  > useful in practice because _that_ kind of code probably needs a
> rethink
> >  > anyway when porting to Android.)
>
> I'm not sure moving that burden to the application is appropriate, either.
> But bash has no android-specific code; there's 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-10 Thread Chet Ramey

On 7/8/23 7:41 AM, Rob Landley wrote:

On 7/6/23 20:09, Chet Ramey wrote:



...

   Distinguishing : from true seems deeply silly

true wasn't a special builtin in the Bourne shell.


It isn't because it wasn't. Historical reasons, no other pattern or logic.


Is there pattern or logic in the original 7th Edition Bourne shell special
builtins? POSIX certainly wasn't going to invent new special builtins. The
ones we have are bad enough.





I note that "help times" does not actually explain that if you don't
already know... 


The accumulated user (1) and system (2) times for the shell (line 1) and
its children (line 2). This maps nicely into getrusage (RUSAGE_SELF, ...)
and getrusage (RUSAGE_CHILDREN, ...). It seems straightforward.


And it explains the waited for grandchildren but presumably not
the ignore sigchld or reparent-to-init children.)


You get whatever getrusage gives you.


I
remember I did make "continue&" work, but don't remember why...)


Why would that not work? It's just a no-op; no semantic meaning.


Not _quite_ a NOP:


I mean, it creates a child process which immediately exits, but it has
no effect on the shell other than to note that we created an asynchronous
child process (which sets $!) that exited. It certainly doesn't affect
flow control.


The & terminates the statement as usual but the command does not run in a child
process.


Come on, of course it does. If you don't execute asynchronous commands in a
subshell, even builtins, you're going to have problems.



 Especially with continue [n] being able to take an argument, that took
a little special casing in my code.


How so?




$ for i in one two three; do echo a=$i; continue& b=$i; done
a=one
[1] 30698
a=two
[2] 30699
a=three
[3] 30700
[1]   Donecontinue
[2]-  Donecontinue

Notice the child processes and lack of b= lines.


Why would you expect a b= line?


If it were actually a NOP because the continue ran in a subshell.


What does this mean? Or did you think there was an echo there?


Reference counting is ok. Bash just copies the parsed function body (x in
this case) and executes that, then frees it. That way you can let the
function get unset and not worry about it.


Each time you call the function? Including all the strings?


Yep. It means you can change the flags associated with the commands without
worrying about restoring them.



You have to parse it to find the end of the command substitution, bottom
line. You can't get it right otherwise.


I acknowledge that it's not right. I expect to hit something that breaks it deep
in some package build, but I'm holding off that bout of re-engineering until
then because I've got so much else to do and in hopes of coming up with a less
hideous way to represent the result by then...


I held out for years.



I'm aware they did it. It failed. They will not acknowledge that.


I suppose it depends on how you define success. They got a standard that
was approved. That wouldn't have happened without pax.



The result of $(blah) and $BLAH are handled the same there? Quotes _inside_ the
subshell are in their own context.


Yes, that's the point I was trying to make.


Hmmm... Smells a bit like indexed arrays are just associative arrays with an
integer key type, but I guess common usage leans towards a span-based
representation?


It depends on whether or not you want to support very large arrays. The
bash implementation has no trouble with

a=( [0x100]=$'\371\200\200\200\200' [0x101]=$'\371\200\200\200\201'
[0x102]=$'\371\200\200\200\202' [0x103]=$'\371\200\200\200\203'
[0x104]=$'\371\200\200\200\204' )

Which will eat huge amounts of memory if you use a C-type array. Bash uses
a doubly-linked list with some saved indices to make sequential access
very fast.


Depends on your definition of "large array". 5 entries large, or large address
space large.


Mostly the latter. If your arrays aren't sparse in some way, you're just
going to allocate huge chunks of memory.



Ok, now I'm even more confused. It's exporting inaccessable values? (I know that
you can export a local, which goes away when the function returns...)


Creating a local variable, which does not inherit the attributes from any
global variable, does not cause the environment to be recreated.


Behavior varies. Maybe bash's local variable unset code should preserve
the export attribute, since several other POSIX shells seem to magically
export the variable if it's given a value after being unset.






Depends whether you're trying to get them to learn C at the same time?

Explaining that you _could_ say printf abc:$X or you can say printf abc:%s $X
and there's multiple ways to do it but you're not expected to understand the
difference between them until much later and as long as you never try to print
anything with a % in it you're fine except yes the $X context expanding to %s
can mean something and no quoting it 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-10 Thread Chet Ramey

On 7/9/23 7:43 PM, Rob Landley wrote:

On 6/18/23 16:28, Rob Landley wrote:

On 6/12/23 19:40, Chet Ramey wrote:

I wish you were not so reluctant. Look at how many things you've discovered
that I decided were bugs based on our discussions.


But since you asked, today's new question I wrestled with was


What happens when you return from a subshell:


If the subshell is invoked while a function is executing, it's equivalent
to `exit'. Where would it return to otherwise?


$ (echo one; return;); echo two
one
bash: return: can only `return' from a function or sourced script
two


The shell isn't executing a shell function.



So in a function, return ends a parenthetical subshell (without error) and
continues on within the function after the parenthetical (and you can return a
second time from the same function because the first didn't count),


A subshell is created as an exact copy of its parent -- including the
knowledge that it is executing a shell function -- so `return' is valid.

Since it can't affect it's parent's environment, a return in the subshell
affects only the subshell. Since there are no more commands that the
subshell should execute, it exits.

How would a `return' in the subshell `count' for the parent? How would
you communicate that to the parent, and why would you want to?

 but return

will NOT end a parenthetical (without error) outside of a function call.


Outside a function call, `return' isn't valid.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-09 Thread Chet Ramey

On 7/7/23 6:45 PM, enh wrote:


 > define "mess up"...

Maybe "mess up" was too strong. I think it's rude to send a fatal signal if
you have the function. Either don't provide it or make it return -1/ENOSYS.
Android is by no means the only place this is a problem; it happens with
docker all the time:

https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html



yeah, if it's any consolation we argued about this choice a lot. (which is 
probably also why the choice exists in the first place!)


iirc we even _tried_ -1/ENOSYS to test the predictions that (a) most c 
programmers don't check for failures (b) even those that do don't log 
enough to be able to debug these problems 


It penalizes those who do.

and (c) even in cases where 
errors are checked and logged, the ability of folks -- including _end 
users_ -- to triage such problems is limited. 


Killing the process certainly doesn't improve that. How is it better for
triage when the process dies unexpectedly with a fatal signal -- when
running a builtin command -- that can't be reproduced anywhere else? If
Grisha Levit hadn't recognized what was going on, I never would have even
looked at it. I don't use Android as a development platform.


i'm not a _fan_ of SIGSYS, 
but i do still think it was the lesser of the available evils. (but of 
course it's the folks who _do_ check for failures who're most likely to 
disagree;


OK, I'm going to disagree.



 > Android deliberately has strict seccomp filters for
 > apps, and the syscalls mentioned in that post are on the "no" list.
Android
 > gives each _app_ a different uid, so there's typically nothing
useful you
 > can do here anyway. (things are a bit different if you're actually
part of
 > the OS, but bash being GPL makes that unlikely :-( )

Sure. But if you have the function, bash assumes it works as documented in
the rest of the Linux world.


and so it does, if you're not an app.

the tricky case here has always been the conflict between the folks who 
want to check at build time versus those who want to check at runtime ... 
if we hide things in the headers, people complain "i wasn't going to call 
it unless i know i can anyway". (plus in addition to the general rule that 
"most configure scripts don't handle cross-compilation correctly", you'd be 
surprised how many screw up the simple "is this function available?" test 
--- they don't compile a small program that #includes the right header and 
tries to call the function; they grep the header or nm the library or 
whatever.)


If AC_CHECK_FUNCS doesn't do the right thing when cross-compiling, maybe
the autotools folks should hear about it?


 > (yes, i agree that it's mildly unfortunate that there's no special
case for
 > "i don't actually want to change anything", which i think is the case
 > they're talking about in that post, and i've wondered about adding
that to
 > libc once or twice, but my feeling is that it wouldn't be particularly
 > useful in practice because _that_ kind of code probably needs a rethink
 > anyway when porting to Android.)


I'm not sure moving that burden to the application is appropriate, either.
But bash has no android-specific code; there's no "porting to android" at
this point.



I just added code to bash that doesn't try to call setresuid/setresgid if
nothing is actually changing, which will save a couple calls everywhere.
But killing the process is rude.


you're talking to the guy who added the equivalent of `if (fp == NULL) 
abort_with_message("you can't pass a null FILE*");` to every stdio function 
because too many developers don't check fopen() succeeded, and think libc 
is broken if fgets() dereferences a null pointer.


Oh, come on, really? What's next? Adding checks in the str* functions
for NULL pointers? I am sure that catering to bad programming is not the
way to improve programming practices.

killing a process with a 
"you called syscall %d and you're not allowed to" message is pretty much 
exactly what you'd expect from me :-) 


Yeah, alex didn't get that.

> bash-5.2$ set +p
>
> [Process completed (signal 31) - press Enter]
>

Bash catches SIGSYS in interactive shells so it can clean up the terminal
pgrps and save the history, so the user's not going to see that message
from bash (assuming you print it from the default signal `handler'). Bash
will kill itself with SIGSYS after restoring the default signal handler,
so it's up to the parent to catch that exit status and report it, which
termux did. It's not that unusual. You can't count on the user seeing it.



it might not seem like it from the outsiders (because you only see the 
stuff that _does_ break), but i'm sure as the maintainer of something old 
and widely-used you're well aware that "not breaking everyone's stuff [even 
when it's terrible]" is a large 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-09 Thread Rob Landley
On 6/18/23 16:28, Rob Landley wrote:
> On 6/12/23 19:40, Chet Ramey wrote:
>> I wish you were not so reluctant. Look at how many things you've discovered
>> that I decided were bugs based on our discussions.
>
> But since you asked, today's new question I wrestled with was

What happens when you return from a subshell:

$ x() { (echo hello; return); echo two; }; x; echo three
hello
two
three
$ x() { (echo hello; return); echo two; return;}; x; echo three
hello
two
three
$ x() { (echo hello; return; echo next); echo two; return;}; x; echo three
hello
two
three
$ (echo one; return;); echo two
one
bash: return: can only `return' from a function or sourced script
two

So in a function, return ends a parenthetical subshell (without error) and
continues on within the function after the parenthetical (and you can return a
second time from the same function because the first didn't count), but return
will NOT end a parenthetical (without error) outside of a function call.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-08 Thread Rob Landley
On 7/6/23 20:09, Chet Ramey wrote:
> On 7/5/23 3:29 AM, Rob Landley wrote:
> It's really a useless concept, by the way.

 It's not that simple: kill has to be built-in or it can't interface with 
 job
 control...
>>>
>>> That's not what a special builtin is. `kill' is a `regular builtin' anyway.
>> 
>> I started down the "rereading that mess" path and it's turning into "reading 
>> all
>> the posix shell stuff" which is not getting bugs fixed. And once again, this 
>> is
>> a BAD STANDARD. Or at least badly organized. There's three groups here:
> 
> OK. This is a decision that was made, what, 45 years ago? These are the
> Bourne shell special builtins -- at least as of SVR4. Korn added a couple,
> but since the Bourne shell didn't have them, they were not added to the
> list.
> 
> Special builtins will exit a non-interactive shell on an error, assignments
> preceding them persist, and they're found before shell functions in the
> command search order. That's pretty much it. It's not that the builtins
> have to be implemented interally, but that these have other properties.
> 
> They're a POSIX concept, so bash conforms when in posix mode. In default
> mode, every builtin is treated the same.

Blah. bash -p is privileged, not posix. And most of bash's command line options
aren't in the option list at the start of the man page, they're in the set
builtin (and you don't search for ^builtin you search for "^shell builtin" which
I never remember and is one of those "I can't look up how to spell something I
don't know how to spell" things...)

Right. Thanks for the explanation: exit, assignments persist, higher priority
than shell functions.

>> Why group 1 doesn't include "wait" I dunno.
> 
> It's not a Bourne shell special builtin: errors in it don't exit the shell.
...
>   Distinguishing : from true seems deeply silly
> 
> true wasn't a special builtin in the Bourne shell.

It isn't because it wasn't. Historical reasons, no other pattern or logic.

> (especially when [ and
>> test aren't)
> 
> Not part of the Bourne shell, only came in in System III, never a special
> builtin.
> 
>> and "times" is job control 
> 
> It's not. It's a straightforward interface to the `times' library function
> (originally system call in 7th edition).

Ah, I thought it was a would list times for children individually. (I've never
used it nor seen anything use it.) So "child processes" here is not restricted
to jobs then...

  $ disown sleep 5; times
  bash: disown: sleep: no such job
  bash: disown: 5: no such job
  0m0.029s 0m0.019s
  0m1.050s 0m0.159s

Sigh. Just... sigh. (TODO: test if this means immediate children or
grandchildren too. TODO: implement disown -c "command". TODO: figure out what
label each of those 4 different times would have... ah the man page of the
system call it's wrapping has the 4 labeled, I'm guessing it's outputting them
in order. I note that "help times" does not actually explain that if you don't
already know... And it explains the waited for grandchildren but presumably not
the ignore sigchld or reparent-to-init children.)


 I
 remember I did make "continue&" work, but don't remember why...)
>>>
>>> Why would that not work? It's just a no-op; no semantic meaning.
>> 
>> Not _quite_ a NOP:
> 
> I mean, it creates a child process which immediately exits, but it has
> no effect on the shell other than to note that we created an asynchronous
> child process (which sets $!) that exited. It certainly doesn't affect
> flow control.

The & terminates the statement as usual but the command does not run in a child
process. Especially with continue [n] being able to take an argument, that took
a little special casing in my code.

>>$ for i in one two three; do echo a=$i; continue& b=$i; done
>>a=one
>>[1] 30698
>>a=two
>>[2] 30699
>>a=three
>>[3] 30700
>>[1]   Donecontinue
>>[2]-  Donecontinue
>> 
>> Notice the child processes and lack of b= lines.
> 
> Why would you expect a b= line?

If it were actually a NOP because the continue ran in a subshell.

> Even if the `continue&' were not there,
> the `;' after the first echo command makes the b= line a separate simple
> command. Who's going to echo `b=$i' and why would they? Maybe if you had
> an `echo' in there instead.

Blah, that's what I meant. :P

>> As far as I can tell, it's NOT more than \$ \\ and \ that get 
>> special
>> treatment in this context? 
> 
> Plus double quote (in double quotes, but not here-documents) and
> backquote.

Aha! I forgot backquote.

>> And it's the short-circuit logic again:
>> 
>>$ echo $((1?2:(1/0)))
>>2
>>$ echo $((1&&(1/0)))
>>bash: 1&&(1/0): division by 0 (error token is "0)")
>>$ echo $((1||(1/0)))
>>1
> 
> That's not the same thing; arithmetic expression evaluation follows the
> C rules for suppressing evaluation.

Which is the short-circuit logic.

>> I hadn't put an "echo" in there, but I'd 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-07 Thread enh via Toybox
On Fri, Jul 7, 2023 at 1:43 PM Chet Ramey  wrote:

> On 7/7/23 3:29 PM, enh wrote:
>
> > cd is a weird one. The v7 Bourne shell exited the shell if the
> directory
> > argument didn't exist, and that didn't change until SVR4.2,
> >
> >
> > and people complain unix isn't user-friendly... :-)
>
> Ha. Sometimes they write books on the subject.
>
> >  > I know there's the "this
> >  > may syntax error and exit the shell" distinction but don't ask me
> > how set or
> >  > true are supposed to do that.
> >
> > set exits the shell on an invalid option and was special in the
> Bourne
> > shell; true isn't a special builtin.
> >
> > (I _think_ they added set here because set -u can
> >  > cause a shell error later? Maybe? But then why unset?
> >
> > Well, unset didn't exist in the 7th edition shell, but it's special
> > in the SVR4 shell. It can only fail if asked to unset a readonly
> > variable or one of the shell's non-unsettable variables. It takes no
> > options, does no argument checking for invalid identifiers, and
> unsetting
> > a variable that's not set isn't an error, but when asked to unset a
> > variable the shell says you can't, the shell exits.
> >
> > I think POSIX made unset a special builtin because the SVR4 sh did
> and
> > so it would be found in the command search before a function. That
> gets
> > important when you're trying to write a secure script,
> >
> >
> > did the v7 bourne shell just not know whether it was interactive or not?
>
> Of course it did.
>
> > (because, yeah, this kind of thing makes a lot more sense as an early
> `set
> > -e`. but i can't imagine using this interactively!)
>
> The 7th edition shell doesn't exit if it's interactive and a special
> builtin fails, that wouldn't make any sense at all. Interactive shells
> just abort the command and jump back to the main command loop. It's the
> same in POSIX.
>
> The 7th edition shell did have `set -e'; Bourne put it in for `make'.
>
> > Yeah, there's a ways to go.
> >
> > https://lists.gnu.org/archive/html/help-bash/2023-06/msg00117.html
> > 
> >
> > They mess up the simple stuff.
> >
> >
> > define "mess up"...
>
> Maybe "mess up" was too strong. I think it's rude to send a fatal signal if
> you have the function. Either don't provide it or make it return -1/ENOSYS.
> Android is by no means the only place this is a problem; it happens with
> docker all the time:
>
> https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html


yeah, if it's any consolation we argued about this choice a lot. (which is
probably also why the choice exists in the first place!)

iirc we even _tried_ -1/ENOSYS to test the predictions that (a) most c
programmers don't check for failures (b) even those that do don't log
enough to be able to debug these problems and (c) even in cases where
errors are checked and logged, the ability of folks -- including _end
users_ -- to triage such problems is limited. i'm not a _fan_ of SIGSYS,
but i do still think it was the lesser of the available evils. (but of
course it's the folks who _do_ check for failures who're most likely to
disagree; in an ideal world, it would have been good for "sophisticated"
code to be able to opt out. though that kind of thing always falls apart at
library boundaries; not all code linked together is of the same quality.)


>
> > Android deliberately has strict seccomp filters for
> > apps, and the syscalls mentioned in that post are on the "no" list.
> Android
> > gives each _app_ a different uid, so there's typically nothing useful
> you
> > can do here anyway. (things are a bit different if you're actually part
> of
> > the OS, but bash being GPL makes that unlikely :-( )
>
> Sure. But if you have the function, bash assumes it works as documented in
> the rest of the Linux world.
>

and so it does, if you're not an app.

the tricky case here has always been the conflict between the folks who
want to check at build time versus those who want to check at runtime ...
if we hide things in the headers, people complain "i wasn't going to call
it unless i know i can anyway". (plus in addition to the general rule that
"most configure scripts don't handle cross-compilation correctly", you'd be
surprised how many screw up the simple "is this function available?" test
--- they don't compile a small program that #includes the right header and
tries to call the function; they grep the header or nm the library or
whatever.)


> > (yes, i agree that it's mildly unfortunate that there's no special case
> for
> > "i don't actually want to change anything", which i think is the case
> > they're talking about in that post, and i've wondered about adding that
> to
> > libc once or twice, but my feeling is that it wouldn't be particularly
> > useful in practice because _that_ kind of code probably needs a rethink
> > anyway when porting 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-07 Thread Chet Ramey

On 7/7/23 3:29 PM, enh wrote:


cd is a weird one. The v7 Bourne shell exited the shell if the directory
argument didn't exist, and that didn't change until SVR4.2,


and people complain unix isn't user-friendly... :-)


Ha. Sometimes they write books on the subject.


 > I know there's the "this
 > may syntax error and exit the shell" distinction but don't ask me
how set or
 > true are supposed to do that.

set exits the shell on an invalid option and was special in the Bourne
shell; true isn't a special builtin.

(I _think_ they added set here because set -u can
 > cause a shell error later? Maybe? But then why unset?

Well, unset didn't exist in the 7th edition shell, but it's special
in the SVR4 shell. It can only fail if asked to unset a readonly
variable or one of the shell's non-unsettable variables. It takes no
options, does no argument checking for invalid identifiers, and unsetting
a variable that's not set isn't an error, but when asked to unset a
variable the shell says you can't, the shell exits.

I think POSIX made unset a special builtin because the SVR4 sh did and
so it would be found in the command search before a function. That gets
important when you're trying to write a secure script,


did the v7 bourne shell just not know whether it was interactive or not? 


Of course it did.

(because, yeah, this kind of thing makes a lot more sense as an early `set 
-e`. but i can't imagine using this interactively!)


The 7th edition shell doesn't exit if it's interactive and a special
builtin fails, that wouldn't make any sense at all. Interactive shells
just abort the command and jump back to the main command loop. It's the
same in POSIX.

The 7th edition shell did have `set -e'; Bourne put it in for `make'.


Yeah, there's a ways to go.

https://lists.gnu.org/archive/html/help-bash/2023-06/msg00117.html


They mess up the simple stuff.


define "mess up"... 


Maybe "mess up" was too strong. I think it's rude to send a fatal signal if
you have the function. Either don't provide it or make it return -1/ENOSYS.
Android is by no means the only place this is a problem; it happens with
docker all the time:

https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html

Android deliberately has strict seccomp filters for 
apps, and the syscalls mentioned in that post are on the "no" list. Android 
gives each _app_ a different uid, so there's typically nothing useful you 
can do here anyway. (things are a bit different if you're actually part of 
the OS, but bash being GPL makes that unlikely :-( )


Sure. But if you have the function, bash assumes it works as documented in
the rest of the Linux world.

(yes, i agree that it's mildly unfortunate that there's no special case for 
"i don't actually want to change anything", which i think is the case 
they're talking about in that post, and i've wondered about adding that to 
libc once or twice, but my feeling is that it wouldn't be particularly 
useful in practice because _that_ kind of code probably needs a rethink 
anyway when porting to Android.)


I just added code to bash that doesn't try to call setresuid/setresgid if
nothing is actually changing, which will save a couple calls everywhere.
But killing the process is rude.

but, yeah, "security" and "self-hosting" aren't exactly friends ... the 1% 
bad guys being the reason we can't have nice things, as usual. (executing 
code off a writable filesystem being frowned upon.)


Oh, I get it.



Thorsten would be happy for android to keep using mksh, I'm sure.


(and i'm going to have a lot of fun dealing with compatibility issues 
if/when we move /bin/sh over...)


You know you will, it's best to start looking forward to it now.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-07 Thread enh via Toybox
On Thu, Jul 6, 2023 at 6:09 PM Chet Ramey  wrote:

> On 7/5/23 3:29 AM, Rob Landley wrote:
>
>
>  It's really a useless concept, by the way.
> >>>
> >>> It's not that simple: kill has to be built-in or it can't interface
> with job
> >>> control...
> >>
> >> That's not what a special builtin is. `kill' is a `regular builtin'
> anyway.
> >
> > I started down the "rereading that mess" path and it's turning into
> "reading all
> > the posix shell stuff" which is not getting bugs fixed. And once again,
> this is
> > a BAD STANDARD. Or at least badly organized. There's three groups here:
>
> OK. This is a decision that was made, what, 45 years ago? These are the
> Bourne shell special builtins -- at least as of SVR4. Korn added a couple,
> but since the Bourne shell didn't have them, they were not added to the
> list.
>
> Special builtins will exit a non-interactive shell on an error, assignments
> preceding them persist, and they're found before shell functions in the
> command search order. That's pretty much it. It's not that the builtins
> have to be implemented interally, but that these have other properties.
>
> They're a POSIX concept, so bash conforms when in posix mode. In default
> mode, every builtin is treated the same.
>
> > 1) flow control commands: break, continue, dot, eval, exec, exit, trap,
> return.
> >
> > 2) variable manipulation commands: export, readonly, set, shift, unset.
> >
> > 3) random crap: colon, times.
> >
> > Why group 1 doesn't include "wait" I dunno.
>
> It's not a Bourne shell special builtin: errors in it don't exit the shell.
>
>
>   Why group 2 has set but not read or
> > alias/unalias in it I couldn't tell you,
>
> read isn't a Bourne shell special builtin; errors in it don't exit the
> shell. The SVR4 shell doesn't have aliases (and aliases were originally
> optional in POSIX, part of the UPE).
>
> and for that matter cd is defined to
> > set $PWD.
>
> cd is a weird one. The v7 Bourne shell exited the shell if the directory
> argument didn't exist, and that didn't change until SVR4.2,


and people complain unix isn't user-friendly... :-)


> but POSIX
> declined to make it a special builtin.
>
>   Distinguishing : from true seems deeply silly
>
> true wasn't a special builtin in the Bourne shell.
>
> (especially when [ and
> > test aren't)
>
> Not part of the Bourne shell, only came in in System III, never a special
> builtin.
>
> > and "times" is job control
>
> It's not. It's a straightforward interface to the `times' library function
> (originally system call in 7th edition).
>
> (it's smells like a jobs flag, but
> > they're not including bg/fg here either which are basically flow control
> group 1
> > above).
>
> Job control wasn't included until the SVR4.2 sh, and it was optional in
> POSIX for a long time.
>
> >
> > And having "command" _not_ be special is just silly:
> >
> >$ command() { echo hello; }
> >$ command ls -l
> >hello
>
> It really can't be; one of the uses for command is to suppress the effects
> of special builtins, so they won't exit the shell on error.
>
> >
> > There's only a few more commands like hash that CAN'T be implemented as
> child
> > processes, but they don't bother to distinguish them.
>
> It's not the difference between special builtins and external commands,
> it's the difference between regular builtins and special builtins.
>
> > I know there's the "this
> > may syntax error and exit the shell" distinction but don't ask me how
> set or
> > true are supposed to do that.
>
> set exits the shell on an invalid option and was special in the Bourne
> shell; true isn't a special builtin.
>
> (I _think_ they added set here because set -u can
> > cause a shell error later? Maybe? But then why unset?
>
> Well, unset didn't exist in the 7th edition shell, but it's special
> in the SVR4 shell. It can only fail if asked to unset a readonly
> variable or one of the shell's non-unsettable variables. It takes no
> options, does no argument checking for invalid identifiers, and unsetting
> a variable that's not set isn't an error, but when asked to unset a
> variable the shell says you can't, the shell exits.
>
> I think POSIX made unset a special builtin because the SVR4 sh did and
> so it would be found in the command search before a function. That gets
> important when you're trying to write a secure script,


did the v7 bourne shell just not know whether it was interactive or not?
(because, yeah, this kind of thing makes a lot more sense as an early `set
-e`. but i can't imagine using this interactively!)


> especially when you
> can inherit functions from the environment (bash) or run a startup file for
> non-interactive shells.
>
> It doesn't seem to affect
> > flow control:
> >
> >$ readonly potato=x; for i in one two three; do echo $i; unset
> potato; done
> >one
> >bash: unset: potato: cannot unset: readonly variable
> >two
> >bash: unset: potato: cannot unset: readonly variable
> >three
> >

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-06 Thread Chet Ramey

On 7/5/23 3:29 AM, Rob Landley wrote:



It's really a useless concept, by the way.


It's not that simple: kill has to be built-in or it can't interface with job
control...


That's not what a special builtin is. `kill' is a `regular builtin' anyway.


I started down the "rereading that mess" path and it's turning into "reading all
the posix shell stuff" which is not getting bugs fixed. And once again, this is
a BAD STANDARD. Or at least badly organized. There's three groups here:


OK. This is a decision that was made, what, 45 years ago? These are the
Bourne shell special builtins -- at least as of SVR4. Korn added a couple,
but since the Bourne shell didn't have them, they were not added to the
list.

Special builtins will exit a non-interactive shell on an error, assignments
preceding them persist, and they're found before shell functions in the
command search order. That's pretty much it. It's not that the builtins
have to be implemented interally, but that these have other properties.

They're a POSIX concept, so bash conforms when in posix mode. In default
mode, every builtin is treated the same.


1) flow control commands: break, continue, dot, eval, exec, exit, trap, return.

2) variable manipulation commands: export, readonly, set, shift, unset.

3) random crap: colon, times.

Why group 1 doesn't include "wait" I dunno.


It's not a Bourne shell special builtin: errors in it don't exit the shell.


 Why group 2 has set but not read or
alias/unalias in it I couldn't tell you, 


read isn't a Bourne shell special builtin; errors in it don't exit the
shell. The SVR4 shell doesn't have aliases (and aliases were originally
optional in POSIX, part of the UPE).

and for that matter cd is defined to

set $PWD.


cd is a weird one. The v7 Bourne shell exited the shell if the directory
argument didn't exist, and that didn't change until SVR4.2, but POSIX
declined to make it a special builtin.

 Distinguishing : from true seems deeply silly

true wasn't a special builtin in the Bourne shell.

(especially when [ and

test aren't)


Not part of the Bourne shell, only came in in System III, never a special
builtin.

and "times" is job control 


It's not. It's a straightforward interface to the `times' library function
(originally system call in 7th edition).

(it's smells like a jobs flag, but

they're not including bg/fg here either which are basically flow control group 1
above).


Job control wasn't included until the SVR4.2 sh, and it was optional in
POSIX for a long time.



And having "command" _not_ be special is just silly:

   $ command() { echo hello; }
   $ command ls -l
   hello


It really can't be; one of the uses for command is to suppress the effects
of special builtins, so they won't exit the shell on error.



There's only a few more commands like hash that CAN'T be implemented as child
processes, but they don't bother to distinguish them. 


It's not the difference between special builtins and external commands,
it's the difference between regular builtins and special builtins.


I know there's the "this
may syntax error and exit the shell" distinction but don't ask me how set or
true are supposed to do that. 


set exits the shell on an invalid option and was special in the Bourne
shell; true isn't a special builtin.

(I _think_ they added set here because set -u can
cause a shell error later? Maybe? But then why unset? 


Well, unset didn't exist in the 7th edition shell, but it's special
in the SVR4 shell. It can only fail if asked to unset a readonly
variable or one of the shell's non-unsettable variables. It takes no
options, does no argument checking for invalid identifiers, and unsetting
a variable that's not set isn't an error, but when asked to unset a
variable the shell says you can't, the shell exits.

I think POSIX made unset a special builtin because the SVR4 sh did and
so it would be found in the command search before a function. That gets
important when you're trying to write a secure script, especially when you
can inherit functions from the environment (bash) or run a startup file for
non-interactive shells.

It doesn't seem to affect

flow control:

   $ readonly potato=x; for i in one two three; do echo $i; unset potato; done
   one
   bash: unset: potato: cannot unset: readonly variable
   two
   bash: unset: potato: cannot unset: readonly variable
   three
   bash: unset: potato: cannot unset: readonly variable


If you were in posix mode it would exit the shell.


I guess it's just the sh -c 'a=b set d=e; echo $a' nonsense which only dash
seems to bother with, which is a good reason _not_ to do it if you ask me...


Everyone does it. Bash does it in posix mode.



In general, And this whole "can exit on error thing" doesn't seem hugely honored
even when posix says (implies) you can:

   $ declare -i potato=1/0
   bash: declare: 1/0: division by 0 (error token is "0")
   $ declare -i potato
   $ set potato=1/0
   $ echo $potato



I guess I don't understand these 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-05 Thread David Seikel
On 2023-07-05 02:29:39, Rob Landley wrote:
> >> still matching the behavior in my devuan install. (Still devuan bronchitis,
> >> haven't updated to devuan cholera yet. Um, the web page says devuan B 
> >> matches
> >> debian "buster" and devuan C matches "bullseye", if that helps.)
> > 
> > Not at all. But charming version names.
> 
> Devuan is "debian without systemd" so 99% just a wrapper around the existing
> debian repositories intercepting the small number of packages that need to be
> diddled, and once they decided they were stable they started doing 
> alphabetical
> release names like xubuntu did, and I never remember what the names are but
> remember the letters. (The debian ones I look up because they don't have 
> letters.)

I'm what we call "the package mirror herder" for Devuan, top level admin
contact for the package mirrors that do that "serve Devuan stuff or
redirect to Debian" thing.  Though some of them are running a Debian
mirror as well, so skip the redirect step.

> I'm still using the B (ahem, "Beowulf") release. The C ("Chimaera") release 
> came
> out October 2021 but I haven't upgraded yet because B (from June 2020) is 
> still
> supported.

I keep complaining that we need to pick names that are easier to spell,
they keep picking the hard names.  I mentioned in IRC that I now have
decided that being hard to spell is a feature, so I spell them wrong
deliberately.  Though now I'm thinking just using the first letter is a
better option.  Thanks for that idea.

Now that the latest Debian has been released, Daedalus is next up for
Devuan, it's close, and quite stable.  Just gotta sort out some things
with the web site and installer.

A ("ASCII") got archived earlier this year, B is still supported until
after the next major Debian release.

> Back when I did busybox, I was replacing the Linux From Scratch packages with
> busybox, and I eventually got a system built from just 7 packages (linux,
> busybox, uclibc, gcc, binutils, make, bash) to rebuild itself under itself 
> from
> source code, and build Linux From Scratch and chunks of Beyond Linux From
> Scratch under the result. (And it cross compiled for a dozen architectures and
> ran the build under QEMU, and could call out to the cross compiler running on
> the host via distcc through the emulated network to move the heavy lifting of
> copilation outside of the emulator to speed things up...)
> 
> https://landley.net/aboriginal/about.html#design
> 
> That worked, meaning I know what success _used_ to look like. Pity it's a 
> moving
> target...

I contributed the x486 support to Aboriginal Linux, my client was using
that at the time for his device.  We managed to get it approved by the
government auditors back then.  Now he's talking about updating it to
some Pi type thing.  Easiest thing for me to do I suspect is to stick
with the version of Aboriginal Linux we used back then, but I'm open to
being persuaded otherwise.  On the other hand, the less changes the
better for the auditors. "Same code you looked at last time, we just
added some more buttons and changed the graphics."

I even put together a Debian based build environment for the auditors, so
they could rebuild it and check if it was reproducible.

-- 
A big old stinking pile of genius that no one wants
coz there are too many silver coated monkeys in the world.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-05 Thread Rob Landley
I have a window open with a half-finished reply in it, and if I've already
replied to this email I apologize...

On 6/19/23 18:32, Chet Ramey wrote:
> On 6/17/23 7:23 PM, Rob Landley wrote:
>> On 6/12/23 19:40, Chet Ramey wrote:
 and they have a list of "special built-in utilities" that does NOT include 
 cd
 (that's listed in normal utilities: how would one go about implementing 
 that
 outside of the shell, do you think?)
>>>
>>> That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
>>> and wait are all regular builtins, and they can't be implemented outside
>>> the shell either.
>>>
>>> Special builtins are defined that way because of their effect:
>>>
>>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14
>>>
>>> It's really a useless concept, by the way.
>> 
>> It's not that simple: kill has to be built-in or it can't interface with job
>> control...
> 
> That's not what a special builtin is. `kill' is a `regular builtin' anyway.

I started down the "rereading that mess" path and it's turning into "reading all
the posix shell stuff" which is not getting bugs fixed. And once again, this is
a BAD STANDARD. Or at least badly organized. There's three groups here:

1) flow control commands: break, continue, dot, eval, exec, exit, trap, return.

2) variable manipulation commands: export, readonly, set, shift, unset.

3) random crap: colon, times.

Why group 1 doesn't include "wait" I dunno. Why group 2 has set but not read or
alias/unalias in it I couldn't tell you, and for that matter cd is defined to
set $PWD. Distinguishing : from true seems deeply silly (especially when [ and
test aren't) and "times" is job control (it's smells like a jobs flag, but
they're not including bg/fg here either which are basically flow control group 1
above).

And having "command" _not_ be special is just silly:

  $ command() { echo hello; }
  $ command ls -l
  hello

There's only a few more commands like hash that CAN'T be implemented as child
processes, but they don't bother to distinguish them. I know there's the "this
may syntax error and exit the shell" distinction but don't ask me how set or
true are supposed to do that. (I _think_ they added set here because set -u can
cause a shell error later? Maybe? But then why unset? It doesn't seem to affect
flow control:

  $ readonly potato=x; for i in one two three; do echo $i; unset potato; done
  one
  bash: unset: potato: cannot unset: readonly variable
  two
  bash: unset: potato: cannot unset: readonly variable
  three
  bash: unset: potato: cannot unset: readonly variable

I guess it's just the sh -c 'a=b set d=e; echo $a' nonsense which only dash
seems to bother with, which is a good reason _not_ to do it if you ask me...

In general, And this whole "can exit on error thing" doesn't seem hugely honored
even when posix says (implies) you can:

  $ declare -i potato=1/0
  bash: declare: 1/0: division by 0 (error token is "0")
  $ declare -i potato
  $ set potato=1/0
  $ echo $potato

  $
  $ (set -x; echo hello ) 2>/dev/full
  hello
  $

Oh, by the way, I remember setting LINENO read only made the shell quite chatty,
but when I tested it just now it was ignored instead?

  $ readonly LINENO
  $ echo $LINENO
  2
  $ echo $LINENO
  3
  $ declare -p LINENO
  declare -ir LINENO="4"
  $ echo $LINENO
  5

Hmmm, maybe it was...

  $ source <(<<<$'readonly LINENO\necho$LINENO\necho $LINENO')
  $ source <(echo $'readonly LINENO\necho $LINENO\necho $LINENO')
  2
  3

Nope, either there's version skew or I need to dig into my notes again. (Sigh, I
need to build current bash and test against that. If I'm going to experience
version skew from distro version upgrades _anyway_, I might as well treat it
like the kernel and try to notice changes early. Alright, bump up the Linux From
Scratch test environment todo list item...)

> (A prefix assignment... on continue? I can't
>> even do a prefix assignment on "if", and I have _use_cases_ for that. I had 
>> that
>> implemented and then backed it out again because it's an error in bash.
> 
> `if' is not a builtin.

Sigh. I know:

  $ abc=123 { echo $abc; }
  bash: syntax error near unexpected token `}'

I keep writing scripts like that and having to fix it...

>> I
>> remember I did make "continue&" work, but don't remember why...)
> 
> Why would that not work? It's just a no-op; no semantic meaning.

Not _quite_ a NOP:

  $ for i in one two three; do echo a=$i; continue& b=$i; done
  a=one
  [1] 30698
  a=two
  [2] 30699
  a=three
  [3] 30700
  [1]   Donecontinue
  [2]-  Donecontinue

Notice the child processes and lack of b= lines.

No, if you want a NOP, put a flow control statement in a pipe:

  $ for i in one two three; do echo a=$i; continue | echo b=$i; echo c=$i; done
  a=one
  b=one
  c=one
  a=two
  b=two
  c=two
  a=three
  b=three
  c=three

>>> 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-07-01 Thread Chet Ramey

On 6/30/23 8:57 PM, Rob Landley wrote:

On 6/12/23 19:40, Chet Ramey wrote:

I wish you were not so reluctant. Look at how many things you've discovered
that I decided were bugs based on our discussions.


But since you asked, today's new question I wrestled with was


Why does eval "" clear $? when a normal newline doesn't, and it's passed into
and out of eval otherwise?


"If there are no arguments, or only null arguments, eval shall return a 
zero exit status"


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_19

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-30 Thread Rob Landley
> On 6/12/23 19:40, Chet Ramey wrote:
>> I wish you were not so reluctant. Look at how many things you've discovered
>> that I decided were bugs based on our discussions.
>
> But since you asked, today's new question I wrestled with was

Why does eval "" clear $? when a normal newline doesn't, and it's passed into
and out of eval otherwise?

  $ false
  $ eval ""
  $ echo $?
  0
  $ false
  $
  $ echo $?
  1
  $ false; eval 'echo $?'
  1
  $ eval 'false'; echo $?
  1
  $ if ; then echo hello; fi
  bash: syntax error near unexpected token `;'
  $ if
  > then echo hello; fi
  bash: syntax error near unexpected token `then'

What's the logic here? Probably related to:

  $ if $NONE; then echo hello; fi
  hello

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-21 Thread Chet Ramey

On 6/18/23 5:28 PM, Rob Landley wrote:


Where did the FF come from?

$ bash -c $'cat<<0\n\\' | hd
bash: line 1: warning: here-document at line 0 delimited by end-of-file (wanted 
`0')
  5c ff 0a  |\..|
0003


If I had to guess, I'd say the EOF (-1) sentinel leaked into the string and
came out as an unsigned char (0xff). That's pretty clearly a bug.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-19 Thread Chet Ramey

On 6/17/23 7:23 PM, Rob Landley wrote:

On 6/12/23 19:40, Chet Ramey wrote:

and they have a list of "special built-in utilities" that does NOT include cd
(that's listed in normal utilities: how would one go about implementing that
outside of the shell, do you think?)


That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
and wait are all regular builtins, and they can't be implemented outside
the shell either.

Special builtins are defined that way because of their effect:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14

It's really a useless concept, by the way.


It's not that simple: kill has to be built-in or it can't interface with job
control...


That's not what a special builtin is. `kill' is a `regular builtin' anyway.



Wait, assignments before these magic utilities are NOT prefix assignments
limited to the duration of the command?


How many times.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14


   $ abc=123 true
   $ echo $abc
   $ abc=123 :
   $ echo $abc
   $ abc=123 eval 'echo $abc'
   123
   $ echo $abc
   $

Nope, even bash doesn't do that. 


You should have tried it in posix mode. I said it was a useless concept,
there's no way bash is going to do that in default mode.

(A prefix assignment... on continue? I can't

even do a prefix assignment on "if", and I have _use_cases_ for that. I had that
implemented and then backed it out again because it's an error in bash.


`if' is not a builtin.


I
remember I did make "continue&" work, but don't remember why...)


Why would that not work? It's just a no-op; no semantic meaning.


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_03


And I need parsing that eats \$ and \\n but leaves \x alone, great. (Which is a
new status flag for expand_arg(), can't be handled in a preparsing pass nor is
NO_QUOTE gonna do it right...)


More characters than those two.



Why is only the second of these an error in bash?

   $ unset ABC; echo ${ABC::'1+2'}
   $ ABC=abcdef; echo ${ABC::'1+2'}
   bash: ABC: '1+2': syntax error: operand expected (error token is "'1+2'")


Because if there's nothing to operate on, bash doesn't try to process the
rest of the word expansion (and if your first command is real, echo will
output a single newline).

This is consistent with POSIX:

"If word is not needed, it shall not be expanded."

even though the substring word expansion isn't POSIX.


I think when the EOF is quoted the HERE body has no processing, and when it's
not quoted then $VARS \$ and \ are the only special... Nope, \\ is too.


Yes, since the body is treated like it's in double quotes, and, as quoted
earlier, \ is one of the characters for which backslash retains its
behavior as a special character. The double quote is the only exception;
look at what these do:

cat <
   https://github.com/landley/toybox/commit/32b3587af261


Ugh.



When you create a new local variable it does so in the most recent named
function context (or the root context if it reaches it), skipping unnamed
function contexts. When you resolve or modify an existing variable (or unset it,
which creates a whiteout entry) it iterates back through all existing function
contexts to find a matching entry (then puts one in the root context if you were
assigning without declaring it local).

So "local blah" won't bind to an anonymous function context, and errors out if
it reaches the root context. I _think_ it works...


OK.


The real question is what value LINENO should have when using -c command,
even though it's only defined for a script or function.


We've gone over that one before. You decided you were going to initialize it to
1 instead of 0,


Yes.


still matching the behavior in my devuan install. (Still devuan bronchitis,
haven't updated to devuan cholera yet. Um, the web page says devuan B matches
debian "buster" and devuan C matches "bullseye", if that helps.)


Not at all. But charming version names.


I was naieve enough to write the variable resolution logic with the design
assumption that unbalanced quoting contexts had already been caught before the
data was passed to us. Kinda biting me now, although I think I'm most of the way
through it.


It was a pain to get that stuff right.


It doesn't handle nested logical contexts, and "case" logic has unmatched
ending parentheses that can end the $() span prematurely...)


Ha. I had ad-hoc parsing that parsed $(...) for years, and it got more and
more complex. I finally gave up on it for bash-5.2 and call the parser
recursively to find the closing right paren (and replaced hundreds of lines
of code with dozens). That's really the only way to do it correctly, but I
was stuck with some compatibility issues because of how bash had not done
it correctly in the past.




Mostly I'm reading the bash man page, pondering many years of
writing and editing bash scripts, and doing LOTS of tests...


And pointing 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-18 Thread Rob Landley
> On 6/12/23 19:40, Chet Ramey wrote:
>> I wish you were not so reluctant. Look at how many things you've discovered
>> that I decided were bugs based on our discussions.
>
> But since you asked, today's new question I wrestled with was

Where did the FF come from?

$ bash -c $'cat<<0\n\\' | hd
bash: line 1: warning: here-document at line 0 delimited by end-of-file (wanted 
`0')
  5c ff 0a  |\..|
0003

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-17 Thread Rob Landley
On 6/12/23 19:40, Chet Ramey wrote:
>> and they have a list of "special built-in utilities" that does NOT include cd
>> (that's listed in normal utilities: how would one go about implementing that
>> outside of the shell, do you think?)
> 
> That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
> and wait are all regular builtins, and they can't be implemented outside
> the shell either.
> 
> Special builtins are defined that way because of their effect:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14
> 
> It's really a useless concept, by the way.

It's not that simple: kill has to be built-in or it can't interface with job
control...

Wait, assignments before these magic utilities are NOT prefix assignments
limited to the duration of the command?

  $ abc=123 true
  $ echo $abc
  $ abc=123 :
  $ echo $abc
  $ abc=123 eval 'echo $abc'
  123
  $ echo $abc
  $

Nope, even bash doesn't do that. (A prefix assignment... on continue? I can't
even do a prefix assignment on "if", and I have _use_cases_ for that. I had that
implemented and then backed it out again because it's an error in bash. I
remember I did make "continue&" work, but don't remember why...)

>> Anyway, I found the third shall retain" in V3_chap02, and... it's wrong?
> 
> No.
> 
>> 
>>> (see Escape Character (Backslash)) only when followed by one of the
>>> following characters when considered special:
>>>
>>>   $   `   "   \   "
>>>
>>> So the backslash-newline gets removed, but, say, a \" only has the
>>> backslash removed.
>> 
>> Because when you put a backslash in front of another char:
>> 
>>$ echo \x
>>x
>>$ basename \x
>>x
> 
> The text I quoted was from the Double Qoutes section. The additional
> reference to (Escape Character...) gives it away.
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_03

And I need parsing that eats \$ and \\n but leaves \x alone, great. (Which is a
new status flag for expand_arg(), can't be handled in a preparsing pass nor is
NO_QUOTE gonna do it right...)

Why is only the second of these an error in bash?

  $ unset ABC; echo ${ABC::'1+2'}
  $ ABC=abcdef; echo ${ABC::'1+2'}
  bash: ABC: '1+2': syntax error: operand expected (error token is "'1+2'")

>> And my approach of handling HERE document lines one at a time probably came 
>> from
>> "...the lines in the here-document are not expanded. If word is unquoted, all
>> lines of the here-document are subjected to  parameter  expansion, command
>> substitution, and arithmetic expansion, the character sequence \ is
>> ignored, and \ must be used to quote the characters \, $, and `."
>> 
>> Except... \ is ignored when the EOF _is_ quoted? It glues lines
>> together when it's not quoted? (It's late and I'm not sure I'm reading this
>> clearly. Need test cases...)
> 
> When the EOF is not quoted, the here-document body is essentially double-
> quoted.

Working on it...

> "In this case, the  in the input behaves as the  
> inside double-quotes (see Double-Quotes)." (POSIX again.)
> 
> The backslash-newline gets removed just like it does in double quotes.
> 
> When the EOF is quoted, the here-document body is essentially single-quoted
> (not exactly, but you get the idea). The backslash-newline gets preserved.

I think when the EOF is quoted the HERE body has no processing, and when it's
not quoted then $VARS \$ and \ are the only special... Nope, \\ is too.

 $ bash -c $'cat<<"";echo X\n\necho Z'
 X
 Z
>>>
>>> This is dodgy behavior to rely on: a null delimiter is matched by the next
>>> blank line, since that's technically "a line containing only the delimiter
>>> and a , with no  characters in between."
>> 
>> I'm trying to match what bash does, which means figuring _out_ what bash 
>> does. I
>> respect posix, but I expect to diverge from it a lot because so much of what 
>> I'm
>> trying to be compatible with already does. :(
> 
> I understand, but what I wrote explains what bash currently does. It's just
> not a good idea for anyone to rely on the behavior of a null here-document
> delimiter. I can't imagine anyone does.

If I can think of a corner case, I'm trying to make it match bash, because stuff
depends on bugs ALL THE TIME:

  https://github.com/landley/toybox/commit/32b3587af261

>> I'm basically abusing function contexts, because that's what I attach local
>> variables to, and $LINENO resets but persists in the same way as local vars:
> 
> I don't think that makes any sense. You can use `return' in a `.' script as
> a special case, but dot doesn't make local variables work. LINENO gets
> reset because you're using a new input source, and reverts to its previous
> value when you go back to the previous input source. LINENO's not good for
> much more than error messages, and it's good to have the current input
> source and the current line number match up. It's not a local variable, per 
> se.

When you create 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-12 Thread Chet Ramey

On 6/12/23 5:23 PM, Rob Landley wrote:

On 6/9/23 15:23, Chet Ramey wrote:

On 6/8/23 10:31 PM, Rob Landley wrote:

On 6/5/23 18:08, Chet Ramey wrote:

You got me. You're right; I had it backwards.


I'm not trying to gotcha anybody, I'm just trying to understand what the right
thing to implement is. I find this entire area surprisingly confusing...


No gotcha here, I was wrong and acknowledge it.




"The  shall retain its special meaning as an escape character


The word pair "shall retain" is not in the bash man page so I'm guessing...
Posix? 


The man page says "retains." I don't do the standard-speak "shall" stuff.



and they have a list of "special built-in utilities" that does NOT include cd
(that's listed in normal utilities: how would one go about implementing that
outside of the shell, do you think?)


That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
and wait are all regular builtins, and they can't be implemented outside
the shell either.

Special builtins are defined that way because of their effect:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14

It's really a useless concept, by the way.


Anyway, I found the third shall retain" in V3_chap02, and... it's wrong?


No.




(see Escape Character (Backslash)) only when followed by one of the
following characters when considered special:

  $   `   "   \   "

So the backslash-newline gets removed, but, say, a \" only has the
backslash removed.


Because when you put a backslash in front of another char:

   $ echo \x
   x
   $ basename \x
   x


The text I quoted was from the Double Qoutes section. The additional
reference to (Escape Character...) gives it away.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_03



And my approach of handling HERE document lines one at a time probably came from
"...the lines in the here-document are not expanded. If word is unquoted, all
lines of the here-document are subjected to  parameter  expansion, command
substitution, and arithmetic expansion, the character sequence \ is
ignored, and \ must be used to quote the characters \, $, and `."

Except... \ is ignored when the EOF _is_ quoted? It glues lines
together when it's not quoted? (It's late and I'm not sure I'm reading this
clearly. Need test cases...)


When the EOF is not quoted, the here-document body is essentially double-
quoted.

"In this case, the  in the input behaves as the  
inside double-quotes (see Double-Quotes)." (POSIX again.)


The backslash-newline gets removed just like it does in double quotes.

When the EOF is quoted, the here-document body is essentially single-quoted
(not exactly, but you get the idea). The backslash-newline gets preserved.




The next POSIX version goes into a lot more detail on how here-documents
are read and processed.


Here's hoping spending a more words to explain it will wind up being an
improvement...


It's not bad, actually. Too much to cut and paste here.



What does `lasts' mean? How the body is delimited, or something else?


Things like continuing past the end of a "source" file and so on. (Data can come
from -c, from stdin, from source, from eval, through $() or <()...)

The colon was an attempt to indicate that examples of what I tried were 
forthcoming.


OK.





$ bash -c $'cat<<0;echo hello\nabc\n0'
abc
hello


POSIX specifies that "the end of a command_string operand (see sh) shall be
treated as a  character."


Which says the trailing \ should vanish for -c, but the bug report this all
started with was that it hadn't, and that broke somebody's thing.


No. The text I quoted is from the section on here-documents, since we're
talking about here-documents. That text is actually from the updated
current draft.



$ bash -c $'cat<<"";echo X\n\necho Z'
X
Z


This is dodgy behavior to rely on: a null delimiter is matched by the next
blank line, since that's technically "a line containing only the delimiter
and a , with no  characters in between."


I'm trying to match what bash does, which means figuring _out_ what bash does. I
respect posix, but I expect to diverge from it a lot because so much of what I'm
trying to be compatible with already does. :(


I understand, but what I wrote explains what bash currently does. It's just
not a good idea for anyone to rely on the behavior of a null here-document
delimiter. I can't imagine anyone does.




$ echo -n 'cat< one
$ echo -n $'potato\nEOF' > two
$ bash -c '. one;. two'
one: line 1: warning: here-document at line 1 delimited by end-of-file 
(wanted
`EOF')
two: line 1: potato: command not found
two: line 2: EOF: command not found


I don't think it's reasonable to expect a word, which is what the here-
document body is, to persist across `.' boundaries, since the contents of a
`.' script are (depending on how you parse them) either a `program' or a
`compound_list'.


I'm basically abusing function 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-12 Thread Rob Landley
On 6/9/23 15:23, Chet Ramey wrote:
> On 6/8/23 10:31 PM, Rob Landley wrote:
>> On 6/5/23 18:08, Chet Ramey wrote:
> You got me. You're right; I had it backwards.

I'm not trying to gotcha anybody, I'm just trying to understand what the right
thing to implement is. I find this entire area surprisingly confusing...

> "The  shall retain its special meaning as an escape character

The word pair "shall retain" is not in the bash man page so I'm guessing...
Posix? (Sigh. Part of my complaint about using posix as a shell source is it's
scattered all over the place in utilities/sh.html and utilities/V3_Chap02.html
and they have a list of "special built-in utilities" that does NOT include cd
(that's listed in normal utilities: how would one go about implementing that
outside of the shell, do you think?)

Anyway, I found the third shall retain" in V3_chap02, and... it's wrong?

> (see Escape Character (Backslash)) only when followed by one of the 
> following characters when considered special:
> 
>  $   `   "   \   "
> 
> So the backslash-newline gets removed, but, say, a \" only has the
> backslash removed.

Because when you put a backslash in front of another char:

  $ echo \x
  x
  $ basename \x
  x

The backslash gets removed anyway, so I don't know what it means by "special"
here. (There's probably a corner case I'm not seeing because it's been too long
since I last read the entire thing from cover to cover and tried to piece
together the choose your own adventure plotlines)

>> In here documents, double quote does NOT remove it:
> 
> Quoting the here-document delimiter has the expected effect. The body is
> considered to be in double quotes if the delimiter is *not* quoted, and
> basically in single quotes if it is ("the here-document lines are not
> expanded").

Nevermind, it was there in the bash man page when I went back and had another
look. (There's a reason I'm using _that_ as my spec when I can...) I was
confused by expecting "" and '' and \ to work consistently, but I remember now
that they _explicitly_ don't:

  $ cat< $PATH
  > EOF
  /home/landley/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
  $ cat<<\EOF
  > $PATH
  > EOF
  $PATH
  $ cat< $PATH
  > EOF
  $PATH

If the EOF has any removable characters anywhere in it, you don't expand
variables. (I thought I had that working at one point... ah, my strcspn doesn't
have \\ in the string, just \" and '.)

Which does at least make:

  $ cat<<\\
  > $PATH
  > \
  $PATH

Terminable. :)

And my approach of handling HERE document lines one at a time probably came from
"...the lines in the here-document are not expanded. If word is unquoted, all
lines of the here-document are subjected to  parameter  expansion, command
substitution, and arithmetic expansion, the character sequence \ is
ignored, and \ must be used to quote the characters \, $, and `."

Except... \ is ignored when the EOF _is_ quoted? It glues lines
together when it's not quoted? (It's late and I'm not sure I'm reading this
clearly. Need test cases...)

> The next POSIX version goes into a lot more detail on how here-documents
> are read and processed.

Here's hoping spending a more words to explain it will wind up being an
improvement...

>>$ cat<<"EOF"
>>> ab\
>>> c
>>> EOF
>>ab\
>>c
> 
>> I also tried to ask questions about how long a HERE document lasts, and:
> 
> What does `lasts' mean? How the body is delimited, or something else?

Things like continuing past the end of a "source" file and so on. (Data can come
from -c, from stdin, from source, from eval, through $() or <()...)

The colon was an attempt to indicate that examples of what I tried were 
forthcoming.

>> 
>>$ bash -c $'cat<<0;echo hello\nabc\n0'
>>abc
>>hello
>
> POSIX specifies that "the end of a command_string operand (see sh) shall be
> treated as a  character."

Which says the trailing \ should vanish for -c, but the bug report this all
started with was that it hadn't, and that broke somebody's thing.

>>$ bash -c $'cat<<"";echo X\n\necho Z'
>>X
>>Z
> 
> This is dodgy behavior to rely on: a null delimiter is matched by the next
> blank line, since that's technically "a line containing only the delimiter
> and a , with no  characters in between."

I'm trying to match what bash does, which means figuring _out_ what bash does. I
respect posix, but I expect to diverge from it a lot because so much of what I'm
trying to be compatible with already does. :(

>>$ echo -n 'cat< one
>>$ echo -n $'potato\nEOF' > two
>>$ bash -c '. one;. two'
>>one: line 1: warning: here-document at line 1 delimited by end-of-file 
>> (wanted
>> `EOF')
>>two: line 1: potato: command not found
>>two: line 2: EOF: command not found
> 
> I don't think it's reasonable to expect a word, which is what the here-
> document body is, to persist across `.' boundaries, since the contents of a
> `.' script are (depending on how you parse them) either a `program' or 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-09 Thread Chet Ramey

On 6/8/23 10:31 PM, Rob Landley wrote:

On 6/5/23 18:08, Chet Ramey wrote:

But escaping a _newline_ is funny in that it glues lines together instead of
creating a command line argument out of the result, which means it has to be
special cased and obviously I'm special casing it wrong, but the special case
has multiple nonobvious features.


I guess. There are two cases: in double quotes, when the backslash-newline
is preserved,


$ printf "abc\

def\n"

abcdef
$ basename "abc\

def" | hd

  61 62 63 64 65 66 0a  |abcdef.|
0007

Define "preserved".


You got me. You're right; I had it backwards.

"The  shall retain its special meaning as an escape character 
(see Escape Character (Backslash)) only when followed by one of the 
following characters when considered special:


$   `   "   \   "

So the backslash-newline gets removed, but, say, a \" only has the
backslash removed.



In here documents, double quote does NOT remove it:


Quoting the here-document delimiter has the expected effect. The body is
considered to be in double quotes if the delimiter is *not* quoted, and
basically in single quotes if it is ("the here-document lines are not
expanded").

The next POSIX version goes into a lot more detail on how here-documents
are read and processed.



   $ cat<<"EOF"
   > ab\
   > c
   > EOF
   ab\
   c




I also tried to ask questions about how long a HERE document lasts, and:


What does `lasts' mean? How the body is delimited, or something else?



   $ bash -c $'cat<<0;echo hello\nabc\n0'
   abc
   hello

POSIX specifies that "the end of a command_string operand (see sh) shall be
treated as a  character."


   $ bash -c $'cat<<"";echo X\n\necho Z'
   X
   Z


This is dodgy behavior to rely on: a null delimiter is matched by the next
blank line, since that's technically "a line containing only the delimiter
and a , with no  characters in between."


   $ echo -n 'cat< one
   $ echo -n $'potato\nEOF' > two
   $ bash -c '. one;. two'
   one: line 1: warning: here-document at line 1 delimited by end-of-file 
(wanted
`EOF')
   two: line 1: potato: command not found
   two: line 2: EOF: command not found


I don't think it's reasonable to expect a word, which is what the here-
document body is, to persist across `.' boundaries, since the contents of a
`.' script are (depending on how you parse them) either a `program' or a
`compound_list'.


I'm also vaguely curious how one WOULD terminate this one:

   $ bash -c $'cat<<\'\n\''
   bash: line 1: warning: here-document at line 1 delimited by end-of-file 
(wanted `
   ')


You can't. A newline here-document delimiter can never be matched, and
only EOF will terminate the here-document. Some shells (e.g., yash) treat
this as a fatal syntax error, but most treat it like bash does. I
considered printing a warning for a delimiter containing a newline, but
decided not to.



Also, -s doesn't work as advertised in the man page?

-sIf  the -s option is present, or if no arguments remain after
  option processing, then commands are read from  the  standard
  input.   This  option  allows the positional parameters to be
  set when invoking an interactive shell or when reading  input
  through a pipe.

   $ echo echo also | bash -s -c 'echo hello'
   hello
   $ echo echo also | bash -c -s 'echo hello'
   hello
   $ echo echo also | bash -c -s -s -s -s 'echo hello'
   hello


-c has higher priority than -s, and you can only use one. It's unspecified
behavior; POSIX doesn't allow those options to be used together. Some ash-
based shells (e.g., dash) execute the command string and then start an
interactive shell, but I don't think that's a great idea.


Yes, but does a backslash newline count as quoted whitespace?


No. In places where the backslash acts as escape character, the backslash-
newline pair is removed from the input stream.

 Backslash

ordinarily quotes, and there's "" which is a quoted nothing but creates an
argument. So this is a new category: a quoted nothing that does NOT create an
argument. 


It's removed from the input stream before tokenization. It doesn't even
delimit a token.


POSIX says you do them in separate steps.


Good to know.


It's always said this. (And bash has always performed steps 3 and 4 in
reverse order, but ...)



Alas, posix says a lot of things, it would be nice if more of them were current
and relevant. I printed it all out and read the whole thing on a series of bus
rides into work when I first sat down to write a new shell for busybox back in
2006. I've had a vague todo to read the new one whenever Issue 8 finally comes
out, but it's been "real soon now" for... how long? (Posix-2008 came out 15
years ago.)


The current edition is from 2018. The next one is in its third draft, then
it has to go through the whole IEEE process, but it may get through
balloting by the end of the year. The standard is 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-08 Thread Rob Landley
On 6/5/23 18:08, Chet Ramey wrote:
>> But escaping a _newline_ is funny in that it glues lines together instead of
>> creating a command line argument out of the result, which means it has to be
>> special cased and obviously I'm special casing it wrong, but the special case
>> has multiple nonobvious features.
> 
> I guess. There are two cases: in double quotes, when the backslash-newline
> is preserved,

$ printf "abc\
> def\n"
abcdef
$ basename "abc\
> def" | hd
  61 62 63 64 65 66 0a  |abcdef.|
0007

Define "preserved".

> and unquoted, where it's removed. Single quotes obviously
> preserve and aren't worth mentioning.

In arguments double quote removes it:

  $ echo abc\
  > def
  abcdef
  $ echo "abc\
  > def"
  abcdef
  $ echo 'abc\
  > def'
  abc\
  def

In here documents, double quote does NOT remove it:

  $ cat< ab\
  > c
  > EOF
  abc
  $ cat<<"EOF"
  > ab\
  > c
  > EOF
  ab\
  c
  $ cat<<'EOF'
  > ab\
  > c
  > EOF
  ab\
  c

Confirmed I'm testing bash:

  $ ls -l /proc/$$/exe
  lrwxrwxrwx 1 landley landley 0 Jun  8 18:33 /proc/25568/exe -> /bin/bash
  $ bash --version
  GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

I also tried to ask questions about how long a HERE document lasts, and:

  $ bash -c $'cat<<0;echo hello\nabc\n0'
  abc
  hello
  $ bash -c $'cat<<"";echo X\n\necho Z'
  X
  Z
  $ echo -n 'cat< one
  $ echo -n $'potato\nEOF' > two
  $ bash -c '. one;. two'
  one: line 1: warning: here-document at line 1 delimited by end-of-file (wanted
`EOF')
  two: line 1: potato: command not found
  two: line 2: EOF: command not found

(Trying to get "matching EOF vs newline" test cases in both directions turns out
to be difficult...)

I'm also vaguely curious how one WOULD terminate this one:

  $ bash -c $'cat<<\'\n\''
  bash: line 1: warning: here-document at line 1 delimited by end-of-file 
(wanted `
  ')
  $ bash -c $'cat<<\'\n\'\n\n\n'
  bash: line 3: warning: here-document at line 1 delimited by end-of-file 
(wanted `
  ')



  $

Also, -s doesn't work as advertised in the man page?

   -sIf  the -s option is present, or if no arguments remain after
 option processing, then commands are read from  the  standard
 input.   This  option  allows the positional parameters to be
 set when invoking an interactive shell or when reading  input
 through a pipe.

  $ echo echo also | bash -s -c 'echo hello'
  hello
  $ echo echo also | bash -c -s 'echo hello'
  hello
  $ echo echo also | bash -c -s -s -s -s 'echo hello'
  hello

But I may have already asked about that one a while back. (I need to reread my
notes...)

>> I think part of it is that my tokenizer removes whitespace between tokens, 
>> and
>> you're not doing that until later? 
> 
> No, the tokenizer produces a stream of tokens. Unquoted whitespace doesn't
> matter.

Yes, but does a backslash newline count as quoted whitespace? Backslash
ordinarily quotes, and there's "" which is a quoted nothing but creates an
argument. So this is a new category: a quoted nothing that does NOT create an
argument. (I think I'm handling it properly now, but it was a new thing in my
if/else staircase.)

> (You're doing more passes over the data than
>> I am, my code tries to do all the work each pass can do so it's not repeating
>> itself. I had a problem that variable expansion and redirect are the same 
>> pass
>> in my code, and different passes in yours, which leads to me being unable to
>> produce quite the same error messages you do in a couple places...)
> 
> POSIX says you do them in separate steps.

Good to know.

Alas, posix says a lot of things, it would be nice if more of them were current
and relevant. I printed it all out and read the whole thing on a series of bus
rides into work when I first sat down to write a new shell for busybox back in
2006. I've had a vague todo to read the new one whenever Issue 8 finally comes
out, but it's been "real soon now" for... how long? (Posix-2008 came out 15
years ago.) The Linux Standard Base got eaten by the Linux Foundation, which is
the same kind of 501c6 as the Tobacco Institute and Microsoft's "don't copy that
floppy" sock puppet were, so of course it's long dead and the "linux device
list" from http://lanana.org/ is 404 and has been for many years. Michael
Kerrisk retired over the pandemic and handed off to a new guy (Alejandro
Colomar) who doesn't even maintain a web copy of the man pages. I yearn for
meaningful standards that aren't swiss-cheese and what _is_ there is "bypassed
like a christmas tree captain, don't give me too many bumps"...)

Mostly I'm collecting test cases I need to pass. I know where I am with a test
case...

>> In general, line continuation priority isn't always obvious to me until I've
>> determined it experimentally:
> 
> You go off and collect here-document bodies as soon as you get a newline
> token after seeing the operator-delimiter pair.

It 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-05 Thread Chet Ramey

On 6/5/23 1:04 AM, Rob Landley wrote:

On 6/1/23 10:20, Chet Ramey wrote:

On 5/29/23 12:39 PM, Rob Landley wrote:


But I'm still left with this divergence:

$ ./sh -c 'echo abc\'
abc
$ bash -c 'echo abc\'
abc\


The backslash doesn't escape anything, EOF delimits the token and command,
and the backslash remains in place for echo to process (or not).


To me this is all part of line continuation logic. My tokenizer is returning
"needs another line to continue" as part of quote processing, and backslash is
basically a single character quote, which yours is doing too:

   $ echo \  | wc -c
   2
   $ echo | wc -c
   1

But escaping a _newline_ is funny in that it glues lines together instead of
creating a command line argument out of the result, which means it has to be
special cased and obviously I'm special casing it wrong, but the special case
has multiple nonobvious features.


I guess. There are two cases: in double quotes, when the backslash-newline
is preserved, and unquoted, where it's removed. Single quotes obviously
preserve and aren't worth mentioning.



I think part of it is that my tokenizer removes whitespace between tokens, and
you're not doing that until later? 


No, the tokenizer produces a stream of tokens. Unquoted whitespace doesn't
matter.

(You're doing more passes over the data than

I am, my code tries to do all the work each pass can do so it's not repeating
itself. I had a problem that variable expansion and redirect are the same pass
in my code, and different passes in yours, which leads to me being unable to
produce quite the same error messages you do in a couple places...)


POSIX says you do them in separate steps.


In general, line continuation priority isn't always obvious to me until I've
determined it experimentally:


You go off and collect here-document bodies as soon as you get a newline
token after seeing the operator-delimiter pair. We had a pretty good
argument about this on the austin-group list.



I'm trying to have tests for everything, but there are a number of corner 
cases...


Which is annoyingly magic because:

$ bash << 'EOF'
> echo abc\
> EOF
abc


So think about this in two pieces: what the here-document does to generate
the input to the shell, and what the shell does with it.


The way I'd done it is the HERE document doesn't generate input, the funky
redirect _requests_ additional input, which is all basically the line
continuation logic where it can't proceed to the "can we actually run this now"
logic because it hasn't yet got a complete thought. I keep keep calling
parse_line() with the next line of input until it returns zero, at which point
it can call run_line() on the accumulated data structure it got parsed into.


There are two parts: reading the body of the here-document, and processing
it as part of performing redirections.

Reading the body is simple. You read lines, until you get a line that
consists solely of the here-document delimiter. You do backslash-newline
processing (or not) during this phase. It's a completely lexical operation,
since the entire here-document is a single word, but it's weird because
you have to save the operator and delimiter until you get a newline and can
go off and collect the body.

You still have to expand it (or not) and pass it to the command on standard
input or the designated file descriptor. That's where you have to do the
`generating' part.



So the shell is supplied input on file descriptor 0 that consists of a
single line (which ends with a newline):

echo abc\


That was the intent, yes.


which the shell reads. Since nothing is quoted, the backslash-newline gets
removed, the shell reads EOF and delimits the token and command, and echo
gets "abc" as its argument.


I thought that "there's a newline at the end of the line, which the \ is
escaping" was relevant, but apparently that's only true for -c.


I'm saying that the behavior should be consistent whether the shell is
processing -c command or not. I think we agree on that.

That behavior should be: if there is an unquoted backslash-newline pair,
it should be removed. If there isn't, a trailing backslash before EOF
should be preserved. Different shells have different behaviors, and
different versions of echo have different bugs with backslash processing,
but I think this is correct.




And also:

$ echo 'echo abc\' > blah
$ cat blah
echo abc\
$ bash ./blah
abc


Same thing, the file ends with a backslash-newline that gets removed, EOF
delimits the token and command, echo gets "abc" and does the expected
thing.


File input and stdin were behaving the same, but -c wasn't. Hence me going "is
it the newline?" later on...


So... do I special case -c here or what?


What's the special case? EOF (or EOS, really) always delimits tokens when
you're using -c command. Just the same as if you had a file that didn't
end with a newline.


Except when I have a file that doesn't end with a newline, a 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-04 Thread Rob Landley
On 6/1/23 10:20, Chet Ramey wrote:
> On 5/29/23 12:39 PM, Rob Landley wrote:
> 
>> But I'm still left with this divergence:
>> 
>>$ ./sh -c 'echo abc\'
>>abc
>>$ bash -c 'echo abc\'
>>abc\
> 
> The backslash doesn't escape anything, EOF delimits the token and command,
> and the backslash remains in place for echo to process (or not).

To me this is all part of line continuation logic. My tokenizer is returning
"needs another line to continue" as part of quote processing, and backslash is
basically a single character quote, which yours is doing too:

  $ echo \  | wc -c
  2
  $ echo | wc -c
  1

But escaping a _newline_ is funny in that it glues lines together instead of
creating a command line argument out of the result, which means it has to be
special cased and obviously I'm special casing it wrong, but the special case
has multiple nonobvious features.

I think part of it is that my tokenizer removes whitespace between tokens, and
you're not doing that until later? (You're doing more passes over the data than
I am, my code tries to do all the work each pass can do so it's not repeating
itself. I had a problem that variable expansion and redirect are the same pass
in my code, and different passes in yours, which leads to me being unable to
produce quite the same error messages you do in a couple places...)

In general, line continuation priority isn't always obvious to me until I've
determined it experimentally:

  $ cat << EOF; if true
  > hello
  > EOF
  > then echo also; fi
  hello
  also
  $ if cat << EOF
  > hello
  > EOF
  > then echo true; fi
  hello
  true
  $ if true; then cat << EOF
  > hello
  > EOF
  > echo next
  > fi
  hello
  next

I'm trying to have tests for everything, but there are a number of corner 
cases...

>> Which is annoyingly magic because:
>> 
>>$ bash << 'EOF'
>>> echo abc\
>>> EOF
>>abc
> 
> So think about this in two pieces: what the here-document does to generate
> the input to the shell, and what the shell does with it.

The way I'd done it is the HERE document doesn't generate input, the funky
redirect _requests_ additional input, which is all basically the line
continuation logic where it can't proceed to the "can we actually run this now"
logic because it hasn't yet got a complete thought. I keep keep calling
parse_line() with the next line of input until it returns zero, at which point
it can call run_line() on the accumulated data structure it got parsed into.

> Since the here-document delimiter is quoted, the `outer' shell doesn't do
> anything special with the backslash-newline. If it were not quoted, the
> backslash-newline would be removed, and the EOF would not delimit the
> here-document.

Indeed. I need to make sure I have a test for that in tests/sh.test...

> So the shell is supplied input on file descriptor 0 that consists of a
> single line (which ends with a newline):
> 
> echo abc\

That was the intent, yes.

> which the shell reads. Since nothing is quoted, the backslash-newline gets
> removed, the shell reads EOF and delimits the token and command, and echo
> gets "abc" as its argument.

I thought that "there's a newline at the end of the line, which the \ is
escaping" was relevant, but apparently that's only true for -c.

>> And also:
>> 
>>$ echo 'echo abc\' > blah
>>$ cat blah
>>echo abc\
>>$ bash ./blah
>>abc
> 
> Same thing, the file ends with a backslash-newline that gets removed, EOF
> delimits the token and command, echo gets "abc" and does the expected
> thing.

File input and stdin were behaving the same, but -c wasn't. Hence me going "is
it the newline?" later on...

>> So... do I special case -c here or what?
> 
> What's the special case? EOF (or EOS, really) always delimits tokens when
> you're using -c command. Just the same as if you had a file that didn't
> end with a newline.

Except when I have a file that doesn't end with a newline, a trailing \ on the
last line is removed. That was one of the later tests.

>> 
>> Aha!
>> 
>>$ bash -c $'echo abc\\'
>>abc\
> 
> There's no difference between this and 'echo abc\'.

Indeed, but it's phrased that way for comparison with the next call. This one
has no newline at the end of the -c input, but is otherwise identical. (Given
how the shell gratuitously strips trailing newlines from "$BLAH" and such, $''
is almost unique in NOT having them stripped...)

Anyway, I'd previously thought -c input wasn't special, in that you can feed
multiple lines into -c and they get parsed as multiple lines:

  $ bash -c $'echo one\necho two'
  one
  two
  $ bash -c $'cat << EOF\nhello\nEOF'
  hello

Which is why in my implementation I'm just feeding them all into
int do_source(char *name, FILE *ff) with calls to fdopen() or fmemopen() when I
want to feed it various types of input.

>>$ bash -c $'echo abc\\\n'
>>abc
> 
> The backslash-newline gets removed. That always happens, regardless of
> where the input is coming from.

Yup, which is what 

Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-06-01 Thread Chet Ramey

On 5/29/23 12:39 PM, Rob Landley wrote:


But I'm still left with this divergence:

   $ ./sh -c 'echo abc\'
   abc
   $ bash -c 'echo abc\'
   abc\


The backslash doesn't escape anything, EOF delimits the token and command,
and the backslash remains in place for echo to process (or not).


Which is annoyingly magic because:

   $ bash << 'EOF'
   > echo abc\
   > EOF
   abc


So think about this in two pieces: what the here-document does to generate
the input to the shell, and what the shell does with it.

Since the here-document delimiter is quoted, the `outer' shell doesn't do
anything special with the backslash-newline. If it were not quoted, the
backslash-newline would be removed, and the EOF would not delimit the
here-document.

So the shell is supplied input on file descriptor 0 that consists of a
single line (which ends with a newline):

echo abc\

which the shell reads. Since nothing is quoted, the backslash-newline gets
removed, the shell reads EOF and delimits the token and command, and echo
gets "abc" as its argument.



And also:

   $ echo 'echo abc\' > blah
   $ cat blah
   echo abc\
   $ bash ./blah
   abc


Same thing, the file ends with a backslash-newline that gets removed, EOF
delimits the token and command, echo gets "abc" and does the expected
thing.



So... do I special case -c here or what?


What's the special case? EOF (or EOS, really) always delimits tokens when
you're using -c command. Just the same as if you had a file that didn't
end with a newline.



Aha!

   $ bash -c $'echo abc\\'
   abc\


There's no difference between this and 'echo abc\'.


   $ bash -c $'echo abc\\\n'
   abc


The backslash-newline gets removed. That always happens, regardless of
where the input is coming from.



So...

   $ echo -n 'echo abc\' | bash
   abc
   $ echo -n 'echo abc\' > blah
   $ bash ./blah
   abc


This looks inconsistent at first glance, I'll take a look.



Nope, that's not it either, -c is still magic even when the file input hasn't
got a newline.


What is `magic' about it?

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] sh: pass "\" to the later app

2023-05-29 Thread Rob Landley
On 5/29/23 02:19, Mingliang HU 胡明亮 wrote:
> Here is failed case for “echo -e "a\n\b\nc\td".
> 
> The expected result is:
> 
> a
> 
> c   d

You're missing a "b" there, but I get the general idea.

> The current result is:
> 
> anbnctd

Yeah, that's wrong.

> The patch let sh pass “\” to “echo”. Then it works fine.

I did a slightly more intrusive version than your patch (changing some of the
code around it, commit 23fc1ecab1b4), and now your test works:

  $ ./sh -c 'echo -e "a\nb\nc\td"'
  a
  b
  c d

And also:

  $ echo ./sh -c "echo 'abc\'"
  ./sh -c echo 'abc\'
  $ ./sh -c "echo 'abc\'"
  abc\

But I'm still left with this divergence:

  $ ./sh -c 'echo abc\'
  abc
  $ bash -c 'echo abc\'
  abc\

Which is annoyingly magic because:

  $ bash << 'EOF'
  > echo abc\
  > EOF
  abc

And also:

  $ echo 'echo abc\' > blah
  $ cat blah
  echo abc\
  $ bash ./blah
  abc

So... do I special case -c here or what?

Aha!

  $ bash -c $'echo abc\\'
  abc\
  $ bash -c $'echo abc\\\n'
  abc

So...

  $ echo -n 'echo abc\' | bash
  abc
  $ echo -n 'echo abc\' > blah
  $ bash ./blah
  abc

Nope, that's not it either, -c is still magic even when the file input hasn't
got a newline.

Chet! What's up with this? I do not understand...

> Mingliang

Thanks,

Rob

P.S. yes there's a trailing space on the "abc " in toybox's last output there,
which is because my "append a line, oops there was no line" logic in the call to
parse_line() in do_source() (currently line 4084) delivers a " " instead of ""
to flush stuff, because completely blank lines get discarded too early in
parse_line(). I need to go through and fix that, here's a test for what else
THAT breaks:

  $ echo $'echo abc\\\n\necho def\n' | bash
  abc
  def
  $ echo $'echo abc\\\n\necho def\n' | ./sh
  abcecho def

It's on the todo list...
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net