On 6/17/23 7:23 PM, Rob Landley wrote:
On 6/12/23 19:40, Chet Ramey wrote:
and they have a list of "special built-in utilities" that does NOT include cd
(that's listed in normal utilities: how would one go about implementing that
outside of the shell, do you think?)

That's not what a special builtin means. alias, fg/bg/jobs, getopts, read,
and wait are all regular builtins, and they can't be implemented outside
the shell either.

Special builtins are defined that way because of their effect:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14

It's really a useless concept, by the way.

It's not that simple: kill has to be built-in or it can't interface with job
control...

That's not what a special builtin is. `kill' is a `regular builtin' anyway.


Wait, assignments before these magic utilities are NOT prefix assignments
limited to the duration of the command?

How many times.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14

   $ abc=123 true
   $ echo $abc
   $ abc=123 :
   $ echo $abc
   $ abc=123 eval 'echo $abc'
   123
   $ echo $abc
   $

Nope, even bash doesn't do that.

You should have tried it in posix mode. I said it was a useless concept,
there's no way bash is going to do that in default mode.

(A prefix assignment... on continue? I can't
even do a prefix assignment on "if", and I have _use_cases_ for that. I had that
implemented and then backed it out again because it's an error in bash.

`if' is not a builtin.

I
remember I did make "continue&" work, but don't remember why...)

Why would that not work? It's just a no-op; no semantic meaning.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_03

And I need parsing that eats \$ and \\n but leaves \x alone, great. (Which is a
new status flag for expand_arg(), can't be handled in a preparsing pass nor is
NO_QUOTE gonna do it right...)

More characters than those two.


Why is only the second of these an error in bash?

   $ unset ABC; echo ${ABC::'1+2'}
   $ ABC=abcdef; echo ${ABC::'1+2'}
   bash: ABC: '1+2': syntax error: operand expected (error token is "'1+2'")

Because if there's nothing to operate on, bash doesn't try to process the
rest of the word expansion (and if your first command is real, echo will
output a single newline).

This is consistent with POSIX:

"If word is not needed, it shall not be expanded."

even though the substring word expansion isn't POSIX.

I think when the EOF is quoted the HERE body has no processing, and when it's
not quoted then $VARS \$ and \<newline> are the only special... Nope, \\ is too.

Yes, since the body is treated like it's in double quotes, and, as quoted
earlier, \ is one of the characters for which backslash retains its
behavior as a special character. The double quote is the only exception;
look at what these do:

cat <<EOF
echo "
EOF

cat <<EOF
echo \"
EOF

   https://github.com/landley/toybox/commit/32b3587af261

Ugh.


When you create a new local variable it does so in the most recent named
function context (or the root context if it reaches it), skipping unnamed
function contexts. When you resolve or modify an existing variable (or unset it,
which creates a whiteout entry) it iterates back through all existing function
contexts to find a matching entry (then puts one in the root context if you were
assigning without declaring it local).

So "local blah" won't bind to an anonymous function context, and errors out if
it reaches the root context. I _think_ it works...

OK.

The real question is what value LINENO should have when using -c command,
even though it's only defined for a script or function.

We've gone over that one before. You decided you were going to initialize it to
1 instead of 0,

Yes.

still matching the behavior in my devuan install. (Still devuan bronchitis,
haven't updated to devuan cholera yet. Um, the web page says devuan B matches
debian "buster" and devuan C matches "bullseye", if that helps.)

Not at all. But charming version names.

I was naieve enough to write the variable resolution logic with the design
assumption that unbalanced quoting contexts had already been caught before the
data was passed to us. Kinda biting me now, although I think I'm most of the way
through it.

It was a pain to get that stuff right.

It doesn't handle nested logical contexts, and "case" logic has unmatched
ending parentheses that can end the $() span prematurely...)

Ha. I had ad-hoc parsing that parsed $(...) for years, and it got more and
more complex. I finally gave up on it for bash-5.2 and call the parser
recursively to find the closing right paren (and replaced hundreds of lines
of code with dozens). That's really the only way to do it correctly, but I
was stuck with some compatibility issues because of how bash had not done
it correctly in the past.


Mostly I'm reading the bash man page, pondering many years of
writing and editing bash scripts, and doing LOTS of tests...

And pointing out places where the man page isn't clear or doesn't describe
the shell's behavior, which I appreciate.

Happy to help. At the same time, trying not to spam you too badly...

It hasn't been a problem so far.


The current edition is from 2018.

Except they said 2008 was the last feature release and everying since is
bugfix-only, and nothing is supposed to introduce, deprecate, or significantly
change anything's semantics.

When, by the way?


That's why it's still "Issue 7". The new stuff is
all queued up for Issue 8, which has been coming soon now since the early Obama
administration.

Oh, I was there.

I was lurking on the posix list since... 2006 I think?
So you know that test now has `<' and `>' binary string operators that use
the current locale, right? That's an example of what I'm talking about.

The project isn't dead, but those are defined as bugfix releases. Adding new
libc functions or command line options, or "can you please put cpio and tar back
in the command list", are out of scope for them.

So wait for issue 8, I guess? It's going to start balloting this year.


Ken or Dennis having a reason means a
lot to me because those guys were really smart. The Programmers Workbench guys,
not so much. "Bill Joy decided" is a coin flip at best...

They all had different, even competing, requirements and goals. Mashey and
the PWB guys were developing for a completely different user base than the
original room 127 group, and Joy and the BSD guys had different hardware
*and* users, and then the ARPA community for 4.2 BSD.

Maybe things would be slightly different if Reiser's VM system (the one Rob
Pike raves about) had been in 32/V and then eventually made it back to
Research in time for 8th edition, but that's not the way it worked out.

Working on it. (Well in busybox somebody else had already written an awk, I just
sent them bug reports and made puppy eyes. This time I have to learn how to use
"awk". And I have to write a "make". And a shell, which is in progress... :)

Seems daunting.

I wish you were not so reluctant. Look at how many things you've discovered
that I decided were bugs based on our discussions.

But I'm taking up your valuable time.

I get to make that decision, don't I? I'm not shy -- I'll tell you if you
send something dumb. Don't gatekeep yourself.

But since you asked, today's new question I wrestled with was "what's the error
logic for slice fields"?

Let's assume `one' is unset.


   $ echo ${one:!}
   bash: !}: event not found

History expansion, nothing to do with the question.

   $ echo ${one:+}

This isn't what you think it is: `:+' is a completely different word
expansion, with different behavior. Since `one' is unset, this expands to
the null string. Even if it were set, the expansion would be null since
nothing follows the `+'.

   $ echo ${one:+:}

See above; bash doesn't do work it doesn't have to.

   $ echo ${one:]} two
   two

Again.

   $ echo ${one:0/0}

And again.

   $ echo ${PATH::1+2}
   /ho

OK, you have a set variable, no mystery here. `offset' and `length' are
arithmetic expressions; a null arithmetic expression evaluates to 0, as
with

echo $(( ))
or
echo $(( $unsetvar ))

and described in ARITHMETIC EVALUATION. So you have three characters
starting at offset 0, the beginning of the string.

   $ echo ${PATH::0/0}
   bash: PATH: 0/0: division by 0 (error token is "0")

When it has to perform the arithmetic evaluation it will, and evaluation
errors get reported as expansion errors.


It's doing math, but only _sometimes_ even reporting division by zero as an 
error?

See above.

Single quotes: preserved. Double quotes: removed when special. For
instance, the double quotes around a command substitution don't make the
characters in the command substitution quoted.

Quotes around $() retain whitespace that would otherwise get IFS'd.

Correct, but that's behavior that affects how the output of the command
substitution is treated, not how the substitution itself is parsed or
executed.

They're the same thing for me: my parsing produces a result.

All parsing produces a result: either a valid command tree, in whatever
data structure you want to use to represent it, or an error. But surely
you make a distinction between the $(...) expansion and what it expands to.

The question is whether $VAR is quoted in

echo "$( for f in $VAR; do echo $f; done )"

If you treat this like $( for f in "$VAR"; do echo $f; done ), you're going
to have problems.


$ echo ${PATH//":"/xxx}
/home/landley/binxxx/usr/local/binxxx/usr/binxxx/binxxx/usr/local/gamesxxx/usr/games
$ echo "${PATH//':'/xxx}"
/home/landley/binxxx/usr/local/binxxx/usr/binxxx/binxxx/usr/local/gamesxxx/usr/games
$ echo "${PATH//"/"/xxx}"
xxxhomexxxlandleyxxxbin:xxxusrxxxlocalxxxbin:xxxusrxxxbin:xxxbin:xxxusrxxxlocalxxxgames:xxxusrxxxgames

Quoting contexts nest...

Well, the non-POSIX expansions get to do what they want, but yes, the inner
quotes are allowed and can quote special pattern characters.


(And "$@" is kind of array variable-ish already...)

Kind of, but it's not sparse. Support for very large sparse arrays is one
thing that informs your implementation.

Oh goddess. (Adds note to sh.tests, which is my text file of cut and paste
snippets to look at later. Yes, my todo lists nest.) Is sparse array a new type
or are all arrays sparse?

All indexed arrays are sparse (the question is meaningless for associative
arrays). Indices that are set are set; indices that are not are unset.

declare -a intarray
intarray[12]=twelve

doesn't automatically set intarray[0..11] to anything.


The variable types I've currently got are:

// Assign one variable from malloced key=val string, returns var struct
// TODO implement remaining types
#define VAR_NOFREE    (1<<10)
#define VAR_WHITEOUT  (1<<9)
#define VAR_DICT      (1<<8)
#define VAR_ARRAY     (1<<7)
#define VAR_INT       (1<<6)
#define VAR_TOLOWER   (1<<5)
#define VAR_TOUPPER   (1<<4)
#define VAR_NAMEREF   (1<<3)
#define VAR_EXPORT    (1<<2)
#define VAR_READONLY  (1<<1)
#define VAR_MAGIC     (1<<0)

WHITEOUT is when you unset a local variable so the
enclosing scope may have an unchanged definition but variable resolution needs
to stop there and get the ${x:=} vs ${x=} part right),

You don't need that one, really. You can use the same value and logic you
do when you have something like

declare -i foo
or
export foo

(unless you use WHITEOUT for this case as well).

`foo' exists as an unset variable, but when you assign a value to foo it
gets exported since the attribute was already there. You just have to be
really disciplined about how you treat this `exists but unset' state.


Anyway, that leaves VAR_ARRAY, and VAR_DICT (for associative arrays). I take it
a sparse array is NOT a dict? (Are all VAR_ARRAY sparse...?)

The implementation doesn't matter. You have indexed arrays, where the
subscript is an arithmetic expression, and associative arrays, where the
subscript is an arbitrary string. You can make them all hash tables, if
you want, or linked lists, or whatever. You can even make them C arrays,
but that will really kill your associative array lookup time.

Asking whether an associative array is sparse doesn't make much sense;
what would the definition of `sparseness' be? For indexed arrays, where the
integer subscript imposes a bounded ordering, it makes sense.


Glancing at my notes for any obvious array todo bits, it's just things like "WHY
does unsetting elements of BASH_ALIASES not remove the corresponding alias, does
this require two representations of the same information?

There's no good reason, I just haven't ever made that work.


Spite: it keeps you going.)

Misanthropy works.


I remember being deeply confused by ${X@Q} when I was first trying to implement
it, but it seems to have switched to a much cleaner $'' syntax since?

The @Q transformation has preferred $'...' since I introduced the
parameter transformations in bash-4.4. I'm not sure when you were looking
at it?

I stuck with the last GPLv2 release for longer than Apple did:

   https://news.ycombinator.com/item?id=18852887

But that version doesn't have parameter transformations, so that part is
moot.

They're not options, per se, according to POSIX. It handles -n as an
initial operand that results in implementation-defined behavior. The next
edition extends that treatment to -e/-E.

An "initial operand", not an argument.

That's the same thing. There are no options to POSIX echo. Everything is
an operand. If a command has options, POSIX specifies them as options, and
it doesn't do that for echo.

Hence the side-eye. In general use, echo has arguments. But posix insists it
does not have arguments. To so save face, they've created an "argument that
isn't an argument", and they want us to pretend that's not what they did.

Because the historical echo implementations were all incompatible -- and
worse, irreconcilable. The POSIX folks did the least worst thing. They all
exist just to make the behavior implementation-defined anyway.


"All options must come before non-option arguments" is a common use pattern,
echo isn't special in this regard. "Unrecognized options are passed through" is
another common pattern.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_02

The latter is possible, but not encouraged.

Heck, you want funky: "kill -stop" vs "kill -s top". It's passing through
unrecognized arguments to a later processing pass, and retroactively declaring
-s as unrecognized because -t isn't a thing.

Those are called out as special cases in the description of `kill' and
dependent on the system supporting the `XSI' option. I agree it's a special
case.

Right. So they're going from "wrong" to "wrong" then:

    $ echo -n -e 'hey\nthere'
    hey
    there$

Yeah, echo is a lost cause. Too many incompatible implementations, too much
existing code. That's why everything non-trivial (like the above) is
implementation-defined. POSIX recommends that everyone use printf.

   $ printf abc\n
   abcn$

Oh yeah, that'll happen.

What did you think would happen to the unquoted backslash?


Maybe posix should eventually break down and admit this is a thing? "ls . -l"
has to work, but "ssh user@server -t ls -l" really really REALLY needs that
second -l going to ls not ssh.

Why do you think they don't acknowledge this today?

   https://landley.net/notes-2016.html#11-03-2016

I don't understand how the two connect? Jorg was truly abrasive, and
didn't endear himself to many people, but I don't see the connection to
argument ordering here.


(Yes, I'm aware of recent changes. That's why I re-engaged with Posix, felt I
owed it to them since the condition under which I said I'd come back
unexpectedly happened. But having already written them off, my heart really
wasn't in it. I _should_, but I'm juggling too many other balls...)

Options only exist as
such if they come before the first non-option argument.

   $ cat <(echo hello) -E
   hello$

Yeah, looks like a bug in cat to me:

$ cat <(echo hello) -E
hello
cat: -E: No such file or directory

The GNU utilities do all sorts of argument reordering, but that doesn't
mean you're going to get that in POSIX.



Options have to
begin with `-'.

   tar tvzf blah.tgz
   ps ax
   ar t /usr/lib/libsupp.a

POSIX doesn't have `tar'.


You can chain "ssh xargs strace nice unshare setsid timeout prlimit chroot"
arbitrarily deep, and each command has its own arguments and then a command line
it execs, which can itself contain arguments. That's usually WHY a command cares
about argument position.

That's not inconsistent with the requirement that ssh options appear before
other arguments.


If you really want to go
hardcore, require that the application (user) supply a `--' before the
remote command and its arguments if you want to use it in this way.

But what's already there works, and has for decades.

A good standards body should document, not legislate.

Where do you think the utility syntax guidelines came from?



And then I submitted a feature request to coreutils:

   https://lists.gnu.org/archive/html/coreutils/2022-01/msg00004.html

Which resulted in a lot of discussion, and an eventual decision to include it,
and some patches were discussed:

   https://lists.gnu.org/archive/html/coreutils/2022-01/msg00048.html

And then when it wasn't in the next release, they said it was "still in
development":

   https://lists.gnu.org/archive/html/coreutils/2022-04/msg00010.html

And then a year later it was still on their todo list:

   https://lists.gnu.org/archive/html/coreutils/2023-02/msg00012.html

This sort of thing consumes my "engaging with bureaucracy" meter.

You can't force volunteers to do anything. They're volunteers! It's not
bureaucracy, they just don't work for you!


http://www.opengroup.org/testing/downloads.html says there's a no-fee license.
Maybe closer to the 1.0 release I'll jump through the hoops to help me document
my deviations?

Think carefully about doing that. It takes a lot of time, and I only did
the shell and builtins tests.


This is completely unspecified behavior.

The standard is not complete, yes.

A different interpretation. There's plenty of unspecified and
implementation-defined behavior.

Bash is an implementation, defining behavior. There may be version skew, but it
does something specific. I just have to think of what questions to ask.

That's not the same thing. More useful for your purposes, maybe, but still
different.

A thing I have done from time to time, but... an expensive thing, as far as
spoons go:

   https://en.wikipedia.org/wiki/Spoon_theory

Yes, everyone has limited resources.




Currently. Posix didn't always exist, the Linux Standard Base was making good
progress until the Linux Foundation's accretion disk swallowed it, man7.org was
decent until Michael Kerrisk retired and handed off to a guy who doesn't
maintain a current web version...

If all you're interested in is Linux, then sure.


For years the de-facto spreadsheet standard was Microsoft Excel and the word
processing file format standard was Microsoft Word. They SUCKED, but had vastly
dominant market share. And every weird corner cases of their behavior was part
of that standard.

Then Star Division cloned compatible versions that could read and write those
files in Star Office,

Yes, I used Star Office when I ran FreeBSD on my desktop for a while.


The point is, once you have two independent implementations, the subset both
support becomes a lot more standard-shaped. This was the IETF way back in the
day, "rough consensus and running code". The bake-offs were to get multiple
interoperable implementations. You NEED two, or this doesn't work. :)

Sure. But when you get beyond two, that intersecting subset becomes a lot
smaller, and the number of parties with skin in the game gets a lot larger.
That's why you have so much implementation defined behavior in the
standard. If you want to walk the road you did, and say "this
implementation is the standard one for me," then that's fine, but you're
not going to be successful getting other implementations to walk that same
road a lot of the time.



What are you using now?

$ bash --version
GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

Jesus, your distro can't even be bothered to apply all the patches for a
single version?

$ ../bash-5.0-patched/bash --version
GNU bash, version 5.0.18(10)-release (x86_64-apple-darwin18.2.0)

This is what makes getting bug reports difficult.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to