Re: Request: Standard hashmaps in sh

2023-12-27 Thread David A. Wheeler via austin-group-l at The Open Group



> On Dec 27, 2023, at 2:03 PM, Chet Ramey via austin-group-l at The Open Group 
>  wrote:
> 
> On 12/27/23 11:26 AM, Andrew Pennebaker via austin-group-l at The Open Group 
> wrote:
>> Many programs depend on hashmaps in order to work.
>> awk is not an answer.
>> The lack of hashmaps forces people to use less efficient algorithms, such as 
>> linear search.
>> The bash family implements it. Simply acknowledging bash associative array 
>> syntax, would instantly improve the scalability of sh scripts.
> 
> That's not the intent of the standard. The standard is supposed to give
> users an idea about what they can rely on for portable scripts (and, to a
> lesser extent, interactive use). While bash and ksh93 implement associative
> arrays, that's not enough for a standard.

zsh also has associative arrays. That's at least 3 shells with associative 
arrays.
The dash developers typically don't add functionality until it's been added to 
the standard first,
so it's pointless to wait until dash adds it. That's the wrong order.

I'm supportive (in principle) of adding hashmaps / associative arrays to shell. 
Rationale:
* Implementation effort in shells is relatively small
* It's a useful general-purpose ability for many cases.
* Many people write simple shell scripts; switching languages just to do this 
makes little sense.
* It's already implemented in at least 3 widely-used shells. I don't know how 
different
  the bash / ksh93 / zsh shells are in this regard; someone would have to do 
that research.

However, someone has to do the work of doing the analysis of existing 
implementations,
creating the proposal, etc. That's no guarantee of acceptance of course :-).

--- David A. Wheeler




Re: [1003.1(2016/18)/Issue7+TC2 0001228]: allow shells to exclude "." and ".." from pathname expansions

2023-08-10 Thread David A. Wheeler via austin-group-l at The Open Group


> On Aug 10, 2023, at 11:11 AM, Austin Group Bug Tracker via austin-group-l at 
> The Open Group  wrote:
> 
> User Reference:
> https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html 
> Section:Pathname Expansion 
> Page Number:2384 
> Line Number:76271 
> Interp Status:  --- 
> Final Accepted Text:https://austingroupbugs.net/view.php?id=1228#c5889

I'm all for this change, in fact, requiring systems to *not* include . and .. 
as glob results (someday) would be great.

However, could this be misinterpreted as allowing a tool to respond to 
"./*.pdf" as *removing* the "./" when returning a match? That would be a 
disaster, because prepending "./" is a widely-recommended way to avoid security 
problems from globs (e.g., to counter filenames that start with "-"). Here's an 
example, I would recommend using "./*.pdf" and never "*.pdf". Some programs 
support "--" but not all do (correctly); prepending "./" is universally 
supported.

--- David A. Wheeler




Re: [Issue 8 drafts 0001471]: Add an orthogonal interface for immediate macro expansion definitions to make

2021-09-09 Thread David A. Wheeler via austin-group-l at The Open Group
Allow me to *try* to bring this back to the original topic :-).

I think it’s vital that “::=“, as (provisionally) accepted *8* years ago, be in 
the final version.
The underlying semantics of this (GNU make’s :=) are widely used.

I don’t know if adding :::= and +:= operators is that vital. But if adding them
(along with ::=) will yield a unified standard for “make" that enables more 
makefiles
to be portable, I’m fine with it. It’s more work to add to implementations & 
documentation,
and I’d like to see commitment from various make implementations to
Implement all of these operators. But if there’s such commitment, great!
But Unicode did similar things, e.g., they added Greek Alpha as well as
Latin A as separate characters to simplify transition from previous systems.
Ideally standards are minimal, but it’s more important to have standards with
the necessary capabilities than minimal standards that lack key features.

Scott Lurndal:
> I've never found Miller's treatise on Recursive make compelling
> enough to forgo the use of recursive makefiles :-).

I think it’s somewhat situationally-dependent. If the directories are
truly independent, recursive makefiles often forgo some parallelism but
are otherwise fine. Once there are interdependencies, my experience
mirrors Miller’s. In any case, it’s clear that a number of users of “make”
depend on immediate evaluation, so it is reasonable to standardize it.

--- David A. Wheeler




Re: [Issue 8 drafts 0001471]: Add an orthogonal interface for immediate macro expansion definitions to make

2021-09-08 Thread David A. Wheeler via austin-group-l at The Open Group


> On Sep 8, 2021, at 2:03 PM, Joerg Schilling via austin-group-l at The Open 
> Group  wrote:
> 
> "David A. Wheeler via austin-group-l at The Open Group" 
>  wrote:
> 
>> 
>>> On Sep 8, 2021, at 1:06 PM, Joerg Schilling via austin-group-l at The Open 
>>> Group  wrote:
>>> Hasn't it been explained many times that the non-orthogonal behavior of 
>>> gmake 
>>> for the += operator for macros created with the gmake := operator is a 
>>> source 
>>> of unpredictable behavior, in special if large layered (via include) 
>>> makefile
>>> systems are used and you cannot easily see how a macro was initially 
>>> created?
>> 
>> It has been claimed in https://www.austingroupbugs.net/view.php?id=1471 
>> <https://www.austingroupbugs.net/view.php?id=1471> ,
>> but not proven. I believe the number of users of GNU make dwarfs all
>> other make implementations combined, and this hasn?t been a problem for 
>> users of GNU make.
>> I?ve never seen that claim by actual users of GNU make. Nor has Paul Smith, 
>> maintainer of GNU make.
> 
> Users of smake and BSD make write large and structured makefiles that use 
> plenty of include statements. This does not apply to gmake users and may be 
> the reason why gmake users do not complain.

This is easily disproven using the Linux kernel & LibreOffice as examples.

The Linux kernel is large & uses GNU make.
Here’s the Linux main tree’s GitHub mirror: https://github.com/torvalds/linux
Main Makefile: https://github.com/torvalds/linux/blob/master/Makefile 
<https://github.com/torvalds/linux/blob/master/Makefile>
And see also its scripts/Makefile.*
It uses immediate-evaluation “:=“ & includes.

LibreOffice uses Makefile.in (automake) with GNU make extensions. See:
https://github.com/LibreOffice/core/blob/master/Makefile.in
Again, many immediate-evaluation “:=“ and include statements.

They don’t do it exactly the way BSD make users would, but that’s not relevant.
They clearly *do* use immediate execution AND include statements, on relatively 
large projects.
Both also use other GNU make capabilities, such as $(shell ...) and $(foreach 
...).
POSIX’s lack of such capabilities is a big reason why many people have
abandoned portable makefiles and generally use GNU make instead.
POSIX make is too impoverished for many use cases.


> Please note that the reason why gmake has many users is not caused by it's 
> features but by the fact that there are OS distributions that install gmake 
> under the name "make”.

There are also OS distributions that install BSD make as “make” & GNU make as 
“gmake”.
That has not prevented GNU make from gaining many users.
In the end it doesn’t matter why a particular implementation has many users; it 
still has many users.

--- David A. Wheeler



Re: [Issue 8 drafts 0001471]: Add an orthogonal interface for immediate macro expansion definitions to make

2021-09-08 Thread David A. Wheeler via austin-group-l at The Open Group

> On Sep 8, 2021, at 1:06 PM, Joerg Schilling via austin-group-l at The Open 
> Group  wrote:
> Hasn't it been explained many times that the non-orthogonal behavior of gmake 
> for the += operator for macros created with the gmake := operator is a source 
> of unpredictable behavior, in special if large layered (via include) makefile
> systems are used and you cannot easily see how a macro was initially created?

It has been claimed in https://www.austingroupbugs.net/view.php?id=1471 
<https://www.austingroupbugs.net/view.php?id=1471> ,
but not proven. I believe the number of users of GNU make dwarfs all
other make implementations combined, and this hasn’t been a problem for users 
of GNU make.
I’ve never seen that claim by actual users of GNU make. Nor has Paul Smith, 
maintainer of GNU make.

> The :::= operator fixes this and allows to predict how other operators behave.

I’m skeptical of *that* reason.

If the goal is to enable easy portability of existing makefiles
that use “:=“ to a POSIX standard operator, that’s a plausible reason to add 
“:::=“.
But that adds yet another operator, and it’s not clear that people need it or
that they would use it if it were added.

--- David A. Wheeler



Re: [Issue 8 drafts 0001471]: Add an orthogonal interface for immediate macro expansion definitions to make

2021-09-08 Thread David A. Wheeler via austin-group-l at The Open Group


> On Sep 8, 2021, at 11:13 AM, Joerg Schilling via austin-group-l at The Open 
> Group  wrote:
> 
> "David A. Wheeler via austin-group-l at The Open Group" 
>  wrote:
> 
>> I agree with Paul Smith. This was agreed on 8 years ago, and the widely-used 
>> GNU make has
>> supported ::= as immediate expansion since 2013. That?s strong precedence. 
>> See the discussion here:
>> https://www.austingroupbugs.net/view.php?id=330
>> 
>> The ?::=? operator was selected because ?:=? had *incompatible* semantics 
>> between BSD make and GNU make.
> 
> That as introduced by accident, because I did not realize at that time that
> gmake used an icompatible implementation that differs from smake and BSD make.

That’s an unfortunate bug but easily fixed. It *is* specifically noted in the 
Rationale :-).

A person named “joerg” in this bug report noted this semantic difference in 
2011-11-17 in our discussion on this topic:
https://www.austingroupbugs.net/view.php?id=330 
<https://www.austingroupbugs.net/view.php?id=330>
I take it you’re not the same person? Or maybe you knew this at one time & 
forgot it later?
If you forgot it later, no big deal, forgetting happens to all of us :-).


> 
>> The article "Recursive Make Considered Harmful" by Peter Miller
>> (http://miller.emu.id.au/pmiller/books/rmch/ [^] and
> 
> This is an article from a person that does not know make(1) in depth (in 
> special not the features from SunPro Make). The problems mentioned there are
> all solved by autoatic dependency handling via .KEEP_STATE: from SunPro Make 
> (introduced in January 1986) and the automatic library dependency handling 
> from SunPro Make via .KEEP_STATE: since approx. 1992.

Peter Miller (who has deceased) was quite expert in make, specifically GNU make:
https://accu.org/journals/overload/14/71/miller_2004/ 
<https://accu.org/journals/overload/14/71/miller_2004/>
I realize you (Joerg) are partial to SunPro make, but many people are *never* 
going to use
SunPro make and simply don’t care about it.
GNU Make is required for building many software systems,
including GCC (since version 3.4), the Linux kernel, LibreOffice, and Mozilla 
Firefox.
For many people, “make” *is* “GNU make”.

The *reason* that “GNU make” is “make” to many people is
partly because GNU make is a good implementation, and that’s fine.
However, it’s also because *practical* use of make often requires features that 
are
not standardized in POSIX. The POSIX standard for make is dreadfully 
impoverished today.
Adding at least one *standard* way to implement immediate execution is a step 
towards
having a more powerful *standard* for make. That not only helps portability,
but it also encourages use of these more powerful mechanisms (*because* they 
are standardized).


> 
>> http://aegis.sourceforge.net/auug97.pdf) [^] notes this as a key problem 
>> when creating makefiles,
>> and strongly recommends using := instead."
> 
> Page not available - server down.

Understand. I think that’s a side-effect of his death.
Here’s a working copy:
https://accu.org/journals/overload/14/71/miller_2004/ 
<https://accu.org/journals/overload/14/71/miller_2004/>

--- David A. Wheeler



Re: [Issue 8 drafts 0001471]: Add an orthogonal interface for immediate macro expansion definitions to make

2021-09-08 Thread David A. Wheeler via austin-group-l at The Open Group
On Sep 8, 2021, at 9:53 AM, Paul Smith via austin-group-l at The Open Group 
 wrote:
> No, that's not right.  In issue 7 there is no way to have any sort of
> immediate expansion in standard make.  That's clearly something that
> users wanted (for the record note that I was not the one who wanted
> this standardized: I didn't propose it or push for it in any way; it
> was users who wanted this).
> 
> The ::= operator added immediate expansion.  That's certainly a useful
> addition, and worthy of creating a new operator for.
> 
> The only legitimate (IMO) reason to add ANOTHER operator :::= is, as
> I've been trying to understand, there's some characteristic of that
> behavior that people would find useful enough to change their makefiles
> to use it, that they can't get by changing their makefiles to use ::=
> which is already accepted.
> 

On Wed, 2021-09-08 at 09:29 +0100, Geoff Clare via austin-group-l:

>> Just because the proposal for ::= was applied to an earlier Issue 8
>> draft than :::= doesn't mean it has any claim to be treated
>> differently as part of the overall changes from Issue 7 to Issue 8.
> 
> If we wanted to have that discussion we should have had it back before
> ::= was accepted.
> 
> At this point, ::= DOES have a claim to be treated differently because
> there IS ample implementation precedent for it: as a result of the
> previous decision back in 2011, GNU make has been providing the ::=
> operator now for almost 8 years (released in GNU make 4.0 in October
> 2013).  It can't be changed now.
> 
> We could remove ::= from the standard and instead add :::= but that
> seems useless to me: there are real makefiles out there using ::= which
> would be made NON-PORTABLE by removing ::= from the standard.

I agree with Paul Smith. This was agreed on 8 years ago, and the widely-used 
GNU make has
supported ::= as immediate expansion since 2013. That’s strong precedence. See 
the discussion here:
https://www.austingroupbugs.net/view.php?id=330

The “::=“ operator was selected because “:=“ had *incompatible* semantics 
between BSD make and GNU make.

I’m one of the people who pushed for ::= BTW, so you can blame me in part :-).

Immediate expansion is *necessary* to practically support makefiles for large 
systems.
As I explained in the original description,
"Traditional make macros (defined with "=") cause exponential growth in 
execution time,
inhibiting scalability as systems get bigger and more complex.
That’s because instead of being expanded when they are defined, they are 
expanded on use.
A widely-used alternative is ":=", which creates an 'immediate-expansion' macro 
(which are expanded immediately) instead of a traditional 'delayed-expansion' 
macro.
The article "Recursive Make Considered Harmful" by Peter Miller
(http://miller.emu.id.au/pmiller/books/rmch/ [^] and
http://aegis.sourceforge.net/auug97.pdf) [^] notes this as a key problem when 
creating makefiles,
and strongly recommends using := instead."

And yes, “::=“ is in use.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 12, 2021, at 1:51 PM, Oğuz  wrote:
> Taking "always double-quote your dollar variables", "eval is evil, avoid it", 
> etc. as "the rule" is cargo cult programming. Average programmer's 
> incompetence doesn't make the shell broken or unsafe or anything like that 
> and doesn't justify parroting nonsensical advice like those.

Double-quoting is VERY VERY good advice, which is why it’s so widely 
recommended & often required. For another example, Google requires it for their 
code <https://google.github.io/styleguide/shellguide.html> and Googlers are not 
stupid. Half of all programmers are BELOW average, and if your code lives over 
time, your code is likely to be maintained by them. In addition, even top 
software developers make mistakes. Assuming that “I cannot ever make a mistake” 
borders on arrogance; everyone has a bad day.

“Cargo cult programming” means you do something without understanding the 
reasons for it. But in this case, we know EXACTLY why it’s done, and there are 
good reasons for it, so no cargo cult is present. You may think you can’t ever 
make mistakes, so double-quoting is not needed, but I frankly don’t believe you.

It is wise to write code in a way that *assumes* that humans make mistakes, and 
reduce (1) the likelihood of mistakes and (2) consequences of those mistakes. 
If it doesn’t matter if your code is correct, then sure, don’t bother. If it 
*matters* that the code is correct, then take steps to increase that likelihood.

BUT: this seems far afield of what a standards body (especially this group) 
normally does, so I’ll get back to the “command -v and friends discussion”.

There’s already “command -v COMMAND”, which is already in POSIX and returns 
true if it can find an executable COMMAND (and described in the spec). It may 
not have *exactly* the semantics the requestor wanted, but in *practice* I 
think it works very well for typical use cases. Why would something more exotic 
need to be standardized? I haven’t seen *why* it would matter.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 12, 2021, at 10:57 AM, Oğuz  wrote:
> 12 Nisan 2021 Pazartesi tarihinde David A. Wheeler via austin-group-l at The 
> Open Group  <mailto:austin-group-l@opengroup.org>> yazdı:
> If you want a robust shell script, I recommend that you try out the tool 
> “shellcheck”.
> That checks a shell script against a set of recommended practices (e.g., use 
> “$variable” not $variable).
> 
> If it makes that suggestion no matter what context `$variable' is used in, I 
> don't see how it'll help make a shell script "robust”.

It’s very common advice to recommend using double-quotes on variable expansions 
unless you
have a *good* reason to do otherwise in shell scripts, because it prevents word 
splitting
on a variable reference, and in most cases you do NOT want the word splitting. 
Examples:
* For example, the "Advanced Bash-Scripting Guide” says,
  “When referencing a variable, it is generally advisable to enclose its name 
in double quotes.”
  https://tldp.org/LDP/abs/html/quotingvar.html
* The “Quotes” page says:
  "When in doubt, double-quote every expansion in your shell commands.”
  
https://mywiki.wooledge.org/Quotes#I.27m_Too_Lazy_to_Read.2C_Just_Tell_Me_What_to_Do
* https://www.tecmint.com/useful-tips-for-writing-bash-scripts-in-linux/
* 
https://levelup.gitconnected.com/9-tips-for-writing-safer-shell-scripts-b0c185da9bae#729e

Yes, there are rare cases where you *do* want word-splitting. In those cases, 
omit the double-quotes,
and shellcheck lets you disable checks in specific uses when you actually *did* 
want this.

Following the rule “always double-quote unless you have special reasons” means 
that you don’t have to do
program-wide analysis to think about word-splitting in most variable references.
As a result, it tends to make creating *reliable* scripts easier.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 10, 2021, at 5:54 AM, Jan Hafer via austin-group-l at The Open Group 
>  wrote:
> ...
> 2. In an ideal scenario the semantic of a word can be make constant, so no 
> other script or shell invocation running afterwards can change it (this would 
> compare to best practices in compiled languages and can be cheaply analyzed 
> before execution of a script).

That sounds like a nightmare, not an ideal.

...
> Does POSIX have any opinion or recommendation how to make SHELL scripting 
> robust?

POSIX, as a specification, generally just states requirements.

If you want a robust shell script, I recommend that you try out the tool 
“shellcheck”.
That checks a shell script against a set of recommended practices (e.g., use 
“$variable” not $variable).
Sadly, spellcheck does not provide a way to directly analyze the shell scripts 
within Makefiles.

--- David A. Wheeler




Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-31 Thread David A. Wheeler
On Fri, 31 Jul 2020 16:51:56 + (UTC), shwaresyst  wrote:
> Please look at my former message.  It stands that \Uu is ISO
> 10646, and that does not represent characters but codepoints,
> multiple of which may be necessary to represent one real
> character, which then may be a valid character in the locale
> encoding.

I think in $'...' the \u and \U should be either:
(1) omitted, or
(2) only be required to support UTF-8. Do NOT require translations of \u and \U 
to other locales. Also, if \u and \U are included, they should simply omit the 
code points as requested; that's it. If a user specifies nonsense, like a 
combining character with no character to combine with, it's not the shell's job 
to disbelieve (maybe it will be concatenated later!).

Also, I believe \nnn octal, which was in my original proposal,
SHOULD NOT be included. 

Details below.

--- David A. Wheeler

=== DETAILS ===

Supporting internationalization is important, but the shell
should be implementable in a small(ish) size.
The original proposal for $'...' did NOT include \u or \U at all,
as you can see here: https://austingroupbugs.net/view.php?id=249

I don't think \u or \U are really *necessary* for international use.
If you want to include characters for
arbitrary language in an encoding that is currently in use... just USE them.
The \u... format is NOT as clear as using the actual characters, because
if you "just use the characters" then editors (etc.)
can display them as the actual characters.
What's way more important is supporting things like \n
(so you can finally set values terminating in newline) and
\xHH hex values (to generate escape sequences & other byte sequences).
Those are *not* easily seen in shell source code.

Also, an aside: I now think \nnn octal, which was in my original proposal,
SHOULD NOT be included. The \nnn syntax is incompatible with bash's \0nnn 
syntax.
Requiring \0nnn syntax would mysteriously different from C's \nnn syntax.
Generally people use hex (not octal) nowadays for identifying characters & 
bytes,
so let's just standardize \xHH. It's really the better thing to use anyway.

I don't *object* to \u and \U as long as their implementation requirements
are not complicated. Obviously some shells support \u and \U, e.g., bash.

HOWEVER: I think that *requiring* all shells to be able translate between
encodings is excessive and unnecessary.
I don't think it's what current shells do.
Instead, simply require that shells at *least* support UTF-8.
Trying to support *all* encodings in a shell is a potentially big
complication, especially in small-memory systems like TVs.
E.g., if someone is using Latin-1 (ISO-8859-1) as their locale,
I think it is *NOT* reasonable to expect the shell to convert
unicode to the other locale. Instead, call on a specialty program to do that!

I did a quick test with bash, and does *not* appear to
try to do any locale conversions. Instead, it appears to assume that
if you ask for \u you want the UTF-8 byte sequence.
E.g., small y, acute accent (ý)
in Latin-1 is decimal 253 and hex 0xfd (and HTML ).
Unsuprisingly, it's Unicode code point U+00fd. But when I set the
locale to ISO-8859-1, it generates its *UTF-8* encoding:
(LANG=ISO-8859-1 echo $'\u00fd' | od -c )

What *is* reasonable to require, if \u and \U are supported?
I think POSIX should require at least support for UTF-8;
this is very widely used, and would provide a "minimum and useful floor"
without a complex implementation. Here are some options:
1. The standard could require that \u and \U *always* generate UTF-8.
All shells could easily implement/support that, and they could then
pass UTF-8 to some other program if it needs to be converted to
another locale. That would make supporting other locales easy,
as long as you're willing to call out to another program.
2. The standard say it must be supported if UTF-8 is the encoding, and
say nothing otherwise. The problem is that \u and \U would then
only be sure to work in that case.
3. Like #2, but also generate UTF-8 if the locale is C or POSIX.
I like this #3 option, it's a useful compromise for common cases.
Then shells don't have to implement the universe of weird special cases.

To see what a current shell implementation does, I looked at bash's docs here:
https://www.gnu.org/software/bash/manual/bash.html
which say:
\u - the Unicode (ISO/IEC 10646) character whose value is the hexadecimal 
value  (one to four hex digits) 
\U - the Unicode (ISO/IEC 10646) character whose value is the 
hexadecimal value  (one to eight hex digits) 
That text says "character", but they don't mean stand-alone characters.
For example, U+032A is a combining character, and this works fine:
echo $'y\u032a'
(with LANG=en_US.UTF-8).

The shell should not try to verify that some sequence is
a valid sequence of characters in some particular encoding;
there's no reason to believe they wou

Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread David A. Wheeler
Steffen Nurpmeso  wrote:
> > And for that it would be tremendous if $'' would be defined so
> > that it can be used as the sole quoting mechanism, and that would
> > then also include expansion of $VAR (i use \$VAR or \${VAR} in my
> > mailer).  But to know exactly how problematic splitting of quotes
> > is for many languages of the world, including right-to-left
> > direction and shift state changes etc., and changing of meaning as
> > such if the sentence cannot be interpreted as a unity, a real
> > expert had to be asked.  Anyhow, the Unicode effort mandates
> > processing of entire strings and denotes isolated treatment as
> > a complete error.

I think eliminating old quoting mechanisms would be a mistake.

On Thu, 30 Jul 2020 16:09:56 +0200, Joerg Schilling 
 wrote:
> Even if it would become part of the standad today, you stilll would need
> to wait some years until all implementations take it up.

That's true for almost all standards changes.
However, many shells *already* implement $'...'.
It's also relatively trivial to implement, and it provides
very useful capabilities (such as the ability to easily assign terminating 
newlines).

I'd still like to see the addition of $'...'.

--- David A. Wheeler