Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Kerin Millar
On Sun, 02 Apr 2023 09:32:20 +0700
Robert Elz  wrote:

> Date:Sat, 1 Apr 2023 19:44:10 -0400
> From:Saint Michael 
> Message-ID:  
> 
> 
>   | The compelling reason is: I may not know how many values are stored in the
>   | comma-separated list.
> 
> Others have told you you're wrong, but this is not any kind of compelling
> reason - you simply give one more variable name than you expected to need
> (than you would have used otherwise) and then all the extra fields that
> you wanted the shell to ignore will be assigned to it - which you are free
> to ignore if you like, or you can test to see if anything is there, and
> issue an error message (or something) if more fields were given than you
> were expecting.   Much better behaviour than the shell simply ignoring
> data (silently).

I would add to this that bash affords one the luxury of using read -a before 
proceeding to check the size of the resulting array.

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Robert Elz
Date:Sat, 1 Apr 2023 18:49:56 -0600
From:Felipe Contreras 
Message-ID:  


  | Fortunately kre did listen.

Not really.I agree that what POSIX currently says is not correct,
which is why the defect report got filed (you may have noticed that there
was no new wording proposed there - and still isn't - which is because
this is very hard to get correct, other than possibly by simply giving
the code that should be executed (no, that won't happen)).

But the others are correct, POSIX (in general) standardises what shells
actually do - you can see this if you read a few pages, all kinds of
things lead to unspecified (or worse, undefined) behaviour.   That's
because different implementations do different things in those cases.
Not because some specific behaviour could not be required, not even
that doing so might not be better all around.   But implementations
don't do the same thing in those cases, and so users cannot rely upon
anything particular happening (sometimes behaviour is unspecified,
but only between a limited number of choices).

The standard has two purposes - one is to allow application writers
(users) work out what they can expect to work, and what they should not
do if they expect code to be portable.   The other is so implementors
of new implementations (of the shell, or anything else included)
know what to implement (and where they can do things differently).

You're right, when the standard uses "shall" it is being prescriptive,
and implementations must do that if they want to claim to conform.
But the standard only does that when the existing (at least major, and
intending to conform) implementations, at the time the standard is
written, actually do what is proposed to be required by a "shall".

There are odd occasions (such as the read errors in scripts) where something
that (almost all) implementations do is so obviously the wrong thing to
do, that the standard requires implementations to change, but if you
looked at that issue, and I believe you did, that was only done after
checking with implementors to see if they were willing to make the
change.

In this case, the standard will certainly end up saying that IFS
characters (both white space and others - there are differences in
how they work, but not in this regard) terminate fields, and
if there is nothing after the final IFS character (or characters,
in the case of IFS whitespace), then there is no additional field,
and if there is something there, then that makes an additional field,
even if there is no IFS terminator following it.   That's because that's
what all (or essentially all) shells do, and always (for almost 45
years now) have done so.

That is, if we have "IFS=," then both a,b,c and a,b,c,
produce 3 fields "a" "b" and "c".

On the other hand, the standard is likely to say that whether
characters other than space/tab/newline which are white space according
to the definition of that term in the standard, can be IFS white
space, is unspecified - because shell implementations are split
about that (about 60/40 for "no" - even though the standard currently
seems to say "yes").   That is unless shell implementers can be persuaded
to change their implementations, which in this case is probably unlikely
(as no-one can be sure that there aren't scripts around which rely
upon their current behaviour - no-one wants to break backward compat).
The effect will probably be that using any white space char in IFS, other
than the blessed 3, will make a script non-portable (might work with one
shell, and not another).

kre




Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Robert Elz
Date:Sat, 1 Apr 2023 19:44:10 -0400
From:Saint Michael 
Message-ID:  


  | The compelling reason is: I may not know how many values are stored in the
  | comma-separated list.

Others have told you you're wrong, but this is not any kind of compelling
reason - you simply give one more variable name than you expected to need
(than you would have used otherwise) and then all the extra fields that
you wanted the shell to ignore will be assigned to it - which you are free
to ignore if you like, or you can test to see if anything is there, and
issue an error message (or something) if more fields were given than you
were expecting.   Much better behaviour than the shell simply ignoring
data (silently).

  | GNU AWK, for instance, acts responsibly in the same exact situation:
  | line="a,b,c,d";awk -F, '{print $1}' <<< $line
  | a

awk is a different language with different rules, used in a different way.
Further, it isn't all that different really - you're only using $1, but
awk doesn't simply discard the other fields, they're there, called $2 $3 ...
There's even NF which tells you how many are there.

kre




Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Lawrence Velázquez
On Sat, Apr 1, 2023, at 9:27 PM, Kerin Millar wrote:
> On Sat, 1 Apr 2023 19:44:10 -0400
> Saint Michael  wrote:
>
>> There is an additional problem with IFS and the command read
>> 
>> Suppose I have variable  $line with a string "a,b,c,d"
>> IFS=',' read -r x1 <<< $line
>> Bash will assign the whole line to x1
>>  echo $x1
>> line="a,b,c,d";IFS=',' read -r x1 <<< $line;echo $x1;
>> a,b,c,d
>> but if I use two variables
>> line="a,b,c,d";IFS=',' read -r x1 x2 <<< $line;echo "$x1 ---> $x2";
>> a ---> b,c,d
>> this is incorrect. If IFS=",", then a read -r statement must assign the
>
> No it isn't.
>
>> first value to the single variable, and disregard the rest.
>
> No it musn't. Read 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html 
> and pay particular attention to the definition of what must happen 
> where there are fewer vars (names) than fields encountered.

Also, observe the behavior of other shells:

% cat foo.sh
echo a,b,c,d | {
IFS=, read x1
printf '%s\n' "$x1"
}

echo a,b,c,d | {
IFS=, read x1 x2
printf '%s ---> %s\n' "$x1" "$x2"
}

% bash foo.sh
a,b,c,d
a ---> b,c,d
% dash foo.sh
a,b,c,d
a ---> b,c,d
% ksh foo.sh
a,b,c,d
a ---> b,c,d
% mksh foo.sh
a,b,c,d
a ---> b,c,d
% yash foo.sh
a,b,c,d
a ---> b,c,d
% zsh foo.sh
a,b,c,d
a ---> b,c,d

And the Heirloom Bourne shell:

b# echo a,b,c,d | { IFS=, read x1; printf '%s\n' "$x1"; 
}; echo a,b,c,d | { IFS=, read x1 x2; printf '%s ---> %s\n' "$x1" "$x2"; }
 a,b,c,d
 a ---> b,c,d

-- 
vq



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Kerin Millar
On Sat, 1 Apr 2023 19:44:10 -0400
Saint Michael  wrote:

> There is an additional problem with IFS and the command read
> 
> Suppose I have variable  $line with a string "a,b,c,d"
> IFS=',' read -r x1 <<< $line
> Bash will assign the whole line to x1
>  echo $x1
> line="a,b,c,d";IFS=',' read -r x1 <<< $line;echo $x1;
> a,b,c,d
> but if I use two variables
> line="a,b,c,d";IFS=',' read -r x1 x2 <<< $line;echo "$x1 ---> $x2";
> a ---> b,c,d
> this is incorrect. If IFS=",", then a read -r statement must assign the

No it isn't.

> first value to the single variable, and disregard the rest.

No it musn't. Read 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html and pay 
particular attention to the definition of what must happen where there are 
fewer vars (names) than fields encountered.

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Lawrence Velázquez
> On Apr 1, 2023, at 8:49 PM, Felipe Contreras  
> wrote:
> 
> On Sat, Apr 1, 2023 at 6:35 PM Lawrence Velázquez  wrote:
>> 
>> On Sat, Apr 1, 2023, at 8:02 PM, Felipe Contreras wrote:
>>> In that example they are discussing whether or not to make that
>>> behavior a *requirement*. That is prescriptive.
>> 
>> You're so busy pretending this is debate club that you're completely
>> missing everyone's point, which is that the Austin Group by and
>> large aims to standardize existing behavior.
> 
> I did not miss your point, you are missing mine


Begin forwarded message:

> From: Emanuele Torre 
> Subject: Re: IFS field splitting doesn't conform with POSIX
> Date: March 30, 2023 at 1:48:54 PM EDT
> To: Felipe Contreras 
> Cc: bug-bash@gnu.org
> 
> On Thu, Mar 30, 2023 at 11:35:08AM -0600, Felipe Contreras wrote:
>>> How can you say that the current implementation that bash, dash, etc.
>>> use is not compliant to the POSIX specification?
>> 
>> I have never said that.
> 
> The title of this thread is "IFS field splitting doesn't conform with
> POSIX".


-- 
vq


Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Felipe Contreras
On Sat, Apr 1, 2023 at 6:35 PM Lawrence Velázquez  wrote:
>
> On Sat, Apr 1, 2023, at 8:02 PM, Felipe Contreras wrote:
> > In that example they are discussing whether or not to make that
> > behavior a *requirement*. That is prescriptive.
>
> You're so busy pretending this is debate club that you're completely
> missing everyone's point, which is that the Austin Group by and
> large aims to standardize existing behavior.

I did not miss your point, you are missing mine, which you are clearly
not interested in listening to.

Fortunately kre did listen.

Cheers.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Lawrence Velázquez
On Sat, Apr 1, 2023, at 8:02 PM, Felipe Contreras wrote:
> In that example they are discussing whether or not to make that
> behavior a *requirement*. That is prescriptive.

You're so busy pretending this is debate club that you're completely
missing everyone's point, which is that the Austin Group by and
large aims to standardize existing behavior.  If a situation arises
where many implementations are not conformant, then the standard
is flawed, and it is wrongheaded to blame the implementations.

-- 
vq



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 1:20 PM Lawrence Velázquez  wrote:
> On Thu, Mar 30, 2023, at 2:25 PM, Felipe Contreras wrote:

> > The challenge is in deciding what they *should* do, which is not
> > descriptive, but prescriptive.
>
> The Austin Group does not see its role as prescriptive, although
> during discussions implementers are often open to modifying their
> implementations to achieve consensus.  If many implementers agree
> to make a change, the result may appear prescriptive.  (A recent
> example is .)

In that example they are discussing whether or not to make that
behavior a *requirement*. That is prescriptive.

> >> If what it says differs from what the majority of shells do, then it's
> >> POSIX that is wrong.
> >
> > Then there is no point in looking at the standard, since we know what
> > it should say
>
> The standard is a reference that describes a set of broadly common
> behaviors.  Not everyone is interested in researching and testing
> an assortment of implementations whenever they want to determine
> whether a behavior is portable.
>
> (Also: bash, dash, ksh, and zsh are not the only shells that exist.)

Precisely because they are not the only shells that exist, an
agreement between current implementers--which they themselves might
see as descriptive of their implementations--results in text that says
"the shell shall", which is prescriptive.

If I write a new shell (which I am seriously considering) which aims
to be called POSIX-compatible, that "shall" is 100% prescriptive.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Greg Wooledge
On Sat, Apr 01, 2023 at 07:44:10PM -0400, Saint Michael wrote:
> There is an additional problem with IFS and the command read
> 
> Suppose I have variable  $line with a string "a,b,c,d"
> IFS=',' read -r x1 <<< $line
[...]

https://mywiki.wooledge.org/BashPitfalls#pf47



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Saint Michael
There is an additional problem with IFS and the command read

Suppose I have variable  $line with a string "a,b,c,d"
IFS=',' read -r x1 <<< $line
Bash will assign the whole line to x1
 echo $x1
line="a,b,c,d";IFS=',' read -r x1 <<< $line;echo $x1;
a,b,c,d
but if I use two variables
line="a,b,c,d";IFS=',' read -r x1 x2 <<< $line;echo "$x1 ---> $x2";
a ---> b,c,d
this is incorrect. If IFS=",", then a read -r statement must assign the
first value to the single variable, and disregard the rest.
and so on, with (n) variables.
The compelling reason is: I may not know how many values are stored in the
comma-separated list.
GNU AWK, for instance, acts responsibly in the same exact situation:
line="a,b,c,d";awk -F, '{print $1}' <<< $line
a
We need to fix this





On Sat, Apr 1, 2023, 6:11 PM Mike Jonkmans  wrote:

> On Sat, Apr 01, 2023 at 03:27:47PM -0400, Lawrence Velázquez wrote:
> > On Fri, Mar 31, 2023, at 2:10 PM, Chet Ramey wrote:
> > > kre filed an interpretation request to get the language cleaned up.
> >
> > For those who might be interested:
> >
> > https://austingroupbugs.net/view.php?id=1649
>
> Thanks for the link.
>
> And well done, kre!
>
> --
> Regards, Mike Jonkmans
>
>


Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Mike Jonkmans
On Sat, Apr 01, 2023 at 03:27:47PM -0400, Lawrence Velázquez wrote:
> On Fri, Mar 31, 2023, at 2:10 PM, Chet Ramey wrote:
> > kre filed an interpretation request to get the language cleaned up.
> 
> For those who might be interested:
> 
> https://austingroupbugs.net/view.php?id=1649

Thanks for the link.

And well done, kre!

-- 
Regards, Mike Jonkmans



Re: IFS field splitting doesn't conform with POSIX

2023-04-01 Thread Lawrence Velázquez
On Fri, Mar 31, 2023, at 2:10 PM, Chet Ramey wrote:
> kre filed an interpretation request to get the language cleaned up.

For those who might be interested:

https://austingroupbugs.net/view.php?id=1649

-- 
vq



Re: IFS field splitting doesn't conform with POSIX

2023-03-31 Thread Chet Ramey

On 3/30/23 3:18 PM, Lawrence Velázquez wrote:


In my view if POSIX was merely descriptive, then the Austin Group
would have no need to discuss much, as it's fairly easy to describe
what current shells do.


Composing technical specifications that describe implementations'
shared behaviors while allowing for their idiosyncrasies is more
involved than you seem to think.


This is well put.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: IFS field splitting doesn't conform with POSIX

2023-03-31 Thread Chet Ramey

On 3/30/23 12:51 PM, Felipe Contreras wrote:


It could very well mean that all shells are implementing POSIX wrong.
Except zsh.


No, interpretations have confirmed that not generating a final empty field
is correct. It's just not clear enough in the text.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: IFS field splitting doesn't conform with POSIX

2023-03-31 Thread Chet Ramey

On 3/30/23 7:12 AM, Felipe Contreras wrote:

Hi,

Consider this example:

 IFS=,
 str='foo,bar,,roo,'
 printf '"%s"\n' $str

There is a discrepancy between how this is interpreted between bash
and zsh: in bash the last comma doesn't generate a field and is
ignored, in zsh a last empty field is generated. Initially I was going
to report the bug in zsh, until I read what the POSIX specification
says about field splitting [1].


Yes, the current wording is sort of a mess. It's one of those places in
the standard where everyone "knows" what is supposed to happen, even if
it's not spelled out explicitly.

This issue came up to the POSIX group three times: 1995, 1998, and most
recently in 2005.

The 1995 interpretation confirms that a trailing IFS character does not
delimit an empty field:

https://www.open-std.org/jtc1/sc22/WG15/docs/rr/9945-2/9945-2-98.html

The 2005 discussion accepted that a trailing IFS character does not
generate a final empty field, since that reflected existing practice back
to the Bourne shell, and primarily concentrated on how word splitting and
`read' interact.

My guess is the intent is that the text about "non-zero-length IFS
whitespace shall delimit a field" is supposed to cover this case, but it
requires a creative reading of the text to get there.

kre filed an interpretation request to get the language cleaned up.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Andreas Schwab
On Mär 30 2023, Felipe Contreras wrote:

> On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
>  wrote:
>>
>> On 3/30/23 2:12 PM, Felipe Contreras wrote:
>> >  IFS=,
>> >  str='foo,bar,,roo,'
>> >  printf '"%s"\n' $str
>> zsh is the only shell that generates an empty last field, no other shell
>> exhibits this behavior.
>
> So? This is argumentum ad populum. The fact that most shells do X
> doesn't imply that POSIX says X.
>
> It could very well mean that all shells are implementing POSIX wrong.
> Except zsh.

Note that zsh by default is not a POSIX shell, and even in sh
compatibilty mode it doesn't strive to be POSIX compliant.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Lawrence Velázquez
On Thu, Mar 30, 2023, at 2:25 PM, Felipe Contreras wrote:
> On Thu, Mar 30, 2023 at 11:48 AM Oğuz İsmail Uysal
>  wrote:
>>
>> On 3/30/23 7:51 PM, Felipe Contreras wrote:
>> > So? This is argumentum ad populum. The fact that most shells do X
>> > doesn't imply that POSIX says X.
>
>> POSIX documents existing practice.
>
> Your definition of what a standard is and mine are very different then.

The Austin Group itself largely disagrees with your position.


> In my view if POSIX was merely descriptive, then the Austin Group
> would have no need to discuss much, as it's fairly easy to describe
> what current shells do.

Composing technical specifications that describe implementations'
shared behaviors while allowing for their idiosyncrasies is more
involved than you seem to think.


> The challenge is in deciding what they *should* do, which is not
> descriptive, but prescriptive.

The Austin Group does not see its role as prescriptive, although
during discussions implementers are often open to modifying their
implementations to achieve consensus.  If many implementers agree
to make a change, the result may appear prescriptive.  (A recent
example is .)


>> If what it says differs from what the majority of shells do, then it's
>> POSIX that is wrong.
>
> Then there is no point in looking at the standard, since we know what
> it should say

The standard is a reference that describes a set of broadly common
behaviors.  Not everyone is interested in researching and testing
an assortment of implementations whenever they want to determine
whether a behavior is portable.

(Also: bash, dash, ksh, and zsh are not the only shells that exist.)


> and there's no point in discussing about what it does actually say.

You miss every shot you don't take.

https://www.opengroup.org/austin/lists.html


-- 
vq



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Kerin Millar
On Thu, 30 Mar 2023 11:52:06 -0600
Felipe Contreras  wrote:

> Chet wrote:
> > Alternately, you can think of the NUL at the end of the string as an
> > additional field terminator,
> 
> Except if you do that, then 'a,' has two fields since the end of the
> string is an additional field terminator, as I explained.
> 
> > but one that follows the adjacency rules and doesn't create any empty
> > fields.
> 
> So it's a *very special* field terminator that is mentioned nowhere in
> the POSIX specification.

I can only suggest issuing a formal request for clarification. Clearly, there 
exists a prevailing consenus across implementations (bash included). For the 
matter not to be broached by the specification - at least, not by my reading - 
seems irregular.

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 11:48 AM Oğuz İsmail Uysal
 wrote:
>
> On 3/30/23 7:51 PM, Felipe Contreras wrote:
> > So? This is argumentum ad populum. The fact that most shells do X
> > doesn't imply that POSIX says X.

> POSIX documents existing practice.

Your definition of what a standard is and mine are very different then.

In my view if POSIX was merely descriptive, then the Austin Group
would have no need to discuss much, as it's fairly easy to describe
what current shells do.

The challenge is in deciding what they *should* do, which is not
descriptive, but prescriptive. That requires much more consideration.

> If what it says differs from what the majority of shells do, then it's
> POSIX that is wrong.

Then there is no point in looking at the standard, since we know what
it should say, and there's no point in discussing about what it does
actually say.

> > Yes. 'foo,bar,' has two terminators, and therefore two fields.
> > 'foo,bar,roo' has two terminators and therefore two fields, plus
> > garbage. You want to interpret 'foo' as a field, even though it does
> > not have an an explicit terminator. But that's not specified anywhere
> > in POSIX. POSIX doesn't say what should be done with the text after
> > the last terminator. You could throw it away and still be conforming
> > to POSIX.

> I don't think *to SPLIT using delimiters as field terminators* involves
> leaving any part out.

The purpose of field terminators is to demarcate the termination of a
field, as in end or close, which is they are not used to split a
string, they are used to join fields in a way that ensures they are
complete.

If you see data like "Name:Peter;Age:35;Balance:30" you don't go and
conclude the last field ended in 30, especially if you are Peter.

If you don't care about the termination of a field, then there's no
point in using field terminators.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Greg Wooledge
On Thu, Mar 30, 2023 at 11:52:06AM -0600, Felipe Contreras wrote:
> Not to mention the small detail that the Internal Field Separator is
> not a *separator*, but a terminator (with certain exceptions).

POSIX itself admits that the name is confusing.  From sh(1posix):

RATIONALE
   [...]
   The  name  IFS  was originally an abbreviation of ``Input Field Separa‐
   tors''; however, this name is misleading as the IFS characters are  ac‐
   tually  used  as  field terminators.



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 11:22 AM Kerin Millar  wrote:
>
> On Thu, 30 Mar 2023 07:51:59 -0600
> Felipe Contreras  wrote:
>
> > On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
> > >
> > > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > > > IFS=,
> > > > str='foo,bar,,roo,'
> > > > printf '"%s"\n' $str
> > > >
> > > > There is a discrepancy between how this is interpreted between bash
> > > > and zsh: in bash the last comma doesn't generate a field and is
> > > > ignored,
> > >
> > > ... which is correct according to POSIX (but not sensible).
> > >
> > > > in zsh a last empty field is generated. Initially I was going
> > > > to report the bug in zsh, until I read what the POSIX specification
> > > > says about field splitting [1].
> > >
> > > You seem to have misinterpreted whatever you read.
> > >
> > > https://mywiki.wooledge.org/BashPitfalls#pf47
> > >
> > > Unbelievable as it may seem, POSIX requires the treatment of IFS as
> > > a field terminator, rather than a field separator. What this means
> > > in our example is that if there's an empty field at the end of the
> > > input line, it will be discarded:
> > >
> > > $ IFS=, read -ra fields <<< "a,b,"
> > > $ declare -p fields
> > > declare -a fields='([0]="a" [1]="b")'
> > >
> > > Where did the empty field go? It was eaten for historical reasons
> > > ("because it's always been that way"). This behavior is not unique
> > > to bash; all conformant shells do it.
> >
> > If you think in terms of terminators instead of separators, then the
> > above code makes sense because if you add ',' at the end of each field
> > (terminate it), you get the original string:
> >
> > printf '%s,' ${fields[@]}
> >
> > But you can't replicate 'a,b' that way, because b does not have a
> > terminator. Obviously we'll want 'b' as a field, therefore one has to
> > assume either 1) the end of the string is considered an implicit
> > terminator, or 2) the terminator in the last field is optional.
> > Neither of these two things is specified in POSIX.
> >
> > If we consider 1) the end of the string is considered an implicit
> > terminator, then 'a' contains a valid field, but then 'a,' contains
> > *two* fields. Making these terminators indistinguishable from
> > separators.
> >
> > We can go for 2) of course, but this is not specified anywhere in
> > POSIX, that's just common practice.
>
> You may find these interesting; the second link in particular.

Indeed.

> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html

This says precisely what I said in 1):

Chet wrote:
> Alternately, you can think of the NUL at the end of the string as an
> additional field terminator,

Except if you do that, then 'a,' has two fields since the end of the
string is an additional field terminator, as I explained.

> but one that follows the adjacency rules and doesn't create any empty
> fields.

So it's a *very special* field terminator that is mentioned nowhere in
the POSIX specification.

> - http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html
>
> Though I was aware of these behaviours, I do find the POSIX wording to be 
> unclear as concerns the observations made by the second link, to say the 
> least.

So I'm not the only one who thinks it's unclear.

Not to mention the small detail that the Internal Field Separator is
not a *separator*, but a terminator (with certain exceptions).

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Emanuele Torre
On Thu, Mar 30, 2023 at 11:35:08AM -0600, Felipe Contreras wrote:
> > How can you say that the current implementation that bash, dash, etc.
> > use is not compliant to the POSIX specification?
> 
> I have never said that.

The title of this thread is "IFS field splitting doesn't conform with
POSIX".

 emanuele6



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Oğuz İsmail Uysal

On 3/30/23 7:51 PM, Felipe Contreras wrote:
So? This is argumentum ad populum. The fact that most shells do X 
doesn't imply that POSIX says X. 
POSIX documents existing practice. If what it says differs from what the 
majority of shells do, then it's POSIX that is wrong. And this mailing 
list is not the right place to complain about it.


Yes. 'foo,bar,' has two terminators, and therefore two fields. 
'foo,bar,roo' has two terminators and therefore two fields, plus 
garbage. You want to interpret 'foo' as a field, even though it does 
not have an an explicit terminator. But that's not specified anywhere 
in POSIX. POSIX doesn't say what should be done with the text after 
the last terminator. You could throw it away and still be conforming 
to POSIX. 
I don't think *to SPLIT using delimiters as field terminators* involves 
leaving any part out.




Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 9:52 AM Emanuele Torre  wrote:
>
> On Thu, Mar 30, 2023 at 07:51:59AM -0600, Felipe Contreras wrote:
> > But you can't replicate 'a,b' that way, because b does not have a
> > terminator. Obviously we'll want 'b' as a field, therefore one has to
> > assume either 1) the end of the string is considered an implicit
> > terminator, or 2) the terminator in the last field is optional.
> > Neither of these two things is specified in POSIX.
> >
> > If we consider 1) the end of the string is considered an implicit
> > terminator, then 'a' contains a valid field, but then 'a,' contains
> > *two* fields. Making these terminators indistinguishable from
> > separators.
>
> I repeatedly disputed this interpretation on IRC by saying that your
> reasoning to come to this conclusion is that "',' can terminate a field,
> and the end of the string can terminate a field, so ',' at the end is
> two terminators".

I did not come to a conclusion, and that is not my reasoning. In IRC
you never paid attention to what I was actually saying, so here you
are attacking a straw man.

> If we extend that reasoning 'a , b' with IFS=' ,' should be split into
> four fields because individually ' ', ',', ' ', and the end of string
> could all terminate a field.

IFS white space characters shall be interpreted differently. That's
clear from the specification.

> You refuse to acknowledge that it does not make sense to claim that a
> comma at the of the string MUST yield an empty last field just because a
> ',' and the "end of string" terminator individually can terminate a
> field.

That is not my claim.

> The correct interpretation is that a field is implicitly terminated by
> the end of the string if it is not explicitly terminated by a
> terminator.

Nowhere in the specification does it say that.

> How can you say that the current implementation that bash, dash, etc.
> use is not compliant to the POSIX specification?

I have never said that.

> If that is not what you are claiming, how do you think that bash's
> implementation of field splitting is not compatible with POSIX
> definition since you did not mention it as a possible interpretations?

I did not say I think that.

My suggestion is that you forget the IRC discussion and focus on what
is being said here.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Andreas Kusalananda Kähäri
On Thu, Mar 30, 2023 at 10:51:58AM -0600, Felipe Contreras wrote:
> On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
>  wrote:
> >
> > On 3/30/23 2:12 PM, Felipe Contreras wrote:
> > >  IFS=,
> > >  str='foo,bar,,roo,'
> > >  printf '"%s"\n' $str
> > zsh is the only shell that generates an empty last field, no other shell
> > exhibits this behavior.
> 
> So? This is argumentum ad populum. The fact that most shells do X
> doesn't imply that POSIX says X.
> 
> It could very well mean that all shells are implementing POSIX wrong.
> Except zsh.

Without getting into this *specific* issue: That's not how POSIX works.
POSIX standardises existing practices.


Cheers,
A

> Or it could mean POSIX doesn't specify which behavior is correct.
> 
> > Besides your link says:
> >  >The shell shall treat each character of the IFS as a delimiter and use
> > the delimiters as *field >terminators* to split the results of parameter
> > expansion, command substitution, and arithmetic >expansion into fields.
> >
> > So the delimiters terminate fields, not separate them.
> 
> Yes. 'foo,bar,' has two terminators, and therefore two fields.
> 'foo,bar,roo' has two terminators and therefore two fields, plus
> garbage.
> 
> You want to interpret 'foo' as a field, even though it does not have
> an an explicit terminator. But that's not specified anywhere in POSIX.
> 
> POSIX doesn't say what should be done with the text after the last
> terminator. You could throw it away and still be conforming to POSIX.
> 
> -- 
> Felipe Contreras

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Kerin Millar
On Thu, 30 Mar 2023 07:51:59 -0600
Felipe Contreras  wrote:

> On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
> >
> > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > > IFS=,
> > > str='foo,bar,,roo,'
> > > printf '"%s"\n' $str
> > >
> > > There is a discrepancy between how this is interpreted between bash
> > > and zsh: in bash the last comma doesn't generate a field and is
> > > ignored,
> >
> > ... which is correct according to POSIX (but not sensible).
> >
> > > in zsh a last empty field is generated. Initially I was going
> > > to report the bug in zsh, until I read what the POSIX specification
> > > says about field splitting [1].
> >
> > You seem to have misinterpreted whatever you read.
> >
> > https://mywiki.wooledge.org/BashPitfalls#pf47
> >
> > Unbelievable as it may seem, POSIX requires the treatment of IFS as
> > a field terminator, rather than a field separator. What this means
> > in our example is that if there's an empty field at the end of the
> > input line, it will be discarded:
> >
> > $ IFS=, read -ra fields <<< "a,b,"
> > $ declare -p fields
> > declare -a fields='([0]="a" [1]="b")'
> >
> > Where did the empty field go? It was eaten for historical reasons
> > ("because it's always been that way"). This behavior is not unique
> > to bash; all conformant shells do it.
> 
> If you think in terms of terminators instead of separators, then the
> above code makes sense because if you add ',' at the end of each field
> (terminate it), you get the original string:
> 
> printf '%s,' ${fields[@]}
> 
> But you can't replicate 'a,b' that way, because b does not have a
> terminator. Obviously we'll want 'b' as a field, therefore one has to
> assume either 1) the end of the string is considered an implicit
> terminator, or 2) the terminator in the last field is optional.
> Neither of these two things is specified in POSIX.
> 
> If we consider 1) the end of the string is considered an implicit
> terminator, then 'a' contains a valid field, but then 'a,' contains
> *two* fields. Making these terminators indistinguishable from
> separators.
> 
> We can go for 2) of course, but this is not specified anywhere in
> POSIX, that's just common practice.

You may find these interesting; the second link in particular.

- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html
- http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html

Though I was aware of these behaviours, I do find the POSIX wording to be 
unclear as concerns the observations made by the second link, to say the least. 
I would add that it is possible to have it both ways, so to speak, though the 
means of going about it are no less confusing than the topic at large.

$ IFS=,
$ str="a,b"
$ arr=($str""); declare -p arr
declare -a arr=([0]="a" [1]="b")
$ str="a,b,"
$ arr=($str""); declare -p arr # duly coercing an empty field that some may 
expect or wish for
declare -a arr=([0]="a" [1]="b" [2]="")

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
 wrote:
>
> On 3/30/23 2:12 PM, Felipe Contreras wrote:
> >  IFS=,
> >  str='foo,bar,,roo,'
> >  printf '"%s"\n' $str
> zsh is the only shell that generates an empty last field, no other shell
> exhibits this behavior.

So? This is argumentum ad populum. The fact that most shells do X
doesn't imply that POSIX says X.

It could very well mean that all shells are implementing POSIX wrong.
Except zsh.

Or it could mean POSIX doesn't specify which behavior is correct.

> Besides your link says:
>  >The shell shall treat each character of the IFS as a delimiter and use
> the delimiters as *field >terminators* to split the results of parameter
> expansion, command substitution, and arithmetic >expansion into fields.
>
> So the delimiters terminate fields, not separate them.

Yes. 'foo,bar,' has two terminators, and therefore two fields.
'foo,bar,roo' has two terminators and therefore two fields, plus
garbage.

You want to interpret 'foo' as a field, even though it does not have
an an explicit terminator. But that's not specified anywhere in POSIX.

POSIX doesn't say what should be done with the text after the last
terminator. You could throw it away and still be conforming to POSIX.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Oğuz İsmail Uysal

On 3/30/23 2:12 PM, Felipe Contreras wrote:

 IFS=,
 str='foo,bar,,roo,'
 printf '"%s"\n' $str
zsh is the only shell that generates an empty last field, no other shell 
exhibits this behavior.


Besides your link says:
>The shell shall treat each character of the IFS as a delimiter and use 
the delimiters as *field >terminators* to split the results of parameter 
expansion, command substitution, and arithmetic >expansion into fields.


So the delimiters terminate fields, not separate them.




Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Emanuele Torre
On Thu, Mar 30, 2023 at 07:51:59AM -0600, Felipe Contreras wrote:
> But you can't replicate 'a,b' that way, because b does not have a
> terminator. Obviously we'll want 'b' as a field, therefore one has to
> assume either 1) the end of the string is considered an implicit
> terminator, or 2) the terminator in the last field is optional.
> Neither of these two things is specified in POSIX.
> 
> If we consider 1) the end of the string is considered an implicit
> terminator, then 'a' contains a valid field, but then 'a,' contains
> *two* fields. Making these terminators indistinguishable from
> separators.

I repeatedly disputed this interpretation on IRC by saying that your
reasoning to come to this conclusion is that "',' can terminate a field,
and the end of the string can terminate a field, so ',' at the end is
two terminators".

If we extend that reasoning 'a , b' with IFS=' ,' should be split into
four fields because individually ' ', ',', ' ', and the end of string
could all terminate a field.

That is obviously not the case because POSIX clearly says that a field
is terminated by the longest match for either a single non-IFS
whitespace character in IFS, and all the IFS-whitespace characters in
IFS around it if any; or a non-zero-length sequence of IFS-whitespace
characters in IFS. So ' , ' is a single terminator.

You refuse to acknowledge that it does not make sense to claim that a
comma at the of the string MUST yield an empty last field just because a
',' and the "end of string" terminator individually can terminate a
field.

The correct interpretation is that a field is implicitly terminated by
the end of the string if it is not explicitly terminated by a
terminator.
Even though this interpretation being repeatedly proposed to you, you
do not even mention it here as a possible interpretation of the
specification. You still insist that the specification can only possibly
be interpreted in the two ways you mentioned.

How can you say that the current implementation that bash, dash, etc.
use is not compliant to the POSIX specification?

And why do you not acknowledge that the logic on which you base your
claim "',' can terminate a field individually and end-of-string can
terminate a field individually, so two of them in a row must have an
empty field between them, and this negates the possibility that at the
end of the string can be considered a single terminator" is flawed?

If that is not what you are claiming, how do you think that bash's
implementation of field splitting is not compatible with POSIX
definition since you did not mention it as a possible interpretations?

 emanuele6



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
>
> On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > IFS=,
> > str='foo,bar,,roo,'
> > printf '"%s"\n' $str
> >
> > There is a discrepancy between how this is interpreted between bash
> > and zsh: in bash the last comma doesn't generate a field and is
> > ignored,
>
> ... which is correct according to POSIX (but not sensible).
>
> > in zsh a last empty field is generated. Initially I was going
> > to report the bug in zsh, until I read what the POSIX specification
> > says about field splitting [1].
>
> You seem to have misinterpreted whatever you read.
>
> https://mywiki.wooledge.org/BashPitfalls#pf47
>
> Unbelievable as it may seem, POSIX requires the treatment of IFS as
> a field terminator, rather than a field separator. What this means
> in our example is that if there's an empty field at the end of the
> input line, it will be discarded:
>
> $ IFS=, read -ra fields <<< "a,b,"
> $ declare -p fields
> declare -a fields='([0]="a" [1]="b")'
>
> Where did the empty field go? It was eaten for historical reasons
> ("because it's always been that way"). This behavior is not unique
> to bash; all conformant shells do it.

If you think in terms of terminators instead of separators, then the
above code makes sense because if you add ',' at the end of each field
(terminate it), you get the original string:

printf '%s,' ${fields[@]}

But you can't replicate 'a,b' that way, because b does not have a
terminator. Obviously we'll want 'b' as a field, therefore one has to
assume either 1) the end of the string is considered an implicit
terminator, or 2) the terminator in the last field is optional.
Neither of these two things is specified in POSIX.

If we consider 1) the end of the string is considered an implicit
terminator, then 'a' contains a valid field, but then 'a,' contains
*two* fields. Making these terminators indistinguishable from
separators.

We can go for 2) of course, but this is not specified anywhere in
POSIX, that's just common practice.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread alex xmb ratchev
how spooky , cant get read / mapfile to separate right
very sad

On Thu, Mar 30, 2023, 15:19 Felipe Contreras 
wrote:

> Hi,
>
> Consider this example:
>
> IFS=,
> str='foo,bar,,roo,'
> printf '"%s"\n' $str
>
> There is a discrepancy between how this is interpreted between bash
> and zsh: in bash the last comma doesn't generate a field and is
> ignored, in zsh a last empty field is generated. Initially I was going
> to report the bug in zsh, until I read what the POSIX specification
> says about field splitting [1].
>
> If we ignore all the complexity regarding IFS white spaces (since our
> IFS doesn't have them), we arrive to this item:
>
> 3.b. Each occurrence in the input of an IFS character that is not
> IFS white space, along with any adjacent IFS white space, shall
> delimit a field, as described previously.
>
> Again, we ignore the white space stuff, which means "each occurrence
> in the input of an IFS character shall delimit a field". So if *each
> occurrence* of a comma shall delimit a field, the last comma should
> delimit a field. We have four commas, therefore we should have five
> fields.
>
> This is not what bash does.
>
> Shouldn't bash generate the last field? At least in POSIX mode (I
> tried with `--posix` same output).
>
> Cheers.
>
> Obligatory stuff:
>
> * version: 5.1.16(1)-release
> * platform: x86_64 Arch Linux
> * compiler: gcc 12.2.1
>
> [1]
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05
>
> --
> Felipe Contreras
>
>


IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
Hi,

Consider this example:

IFS=,
str='foo,bar,,roo,'
printf '"%s"\n' $str

There is a discrepancy between how this is interpreted between bash
and zsh: in bash the last comma doesn't generate a field and is
ignored, in zsh a last empty field is generated. Initially I was going
to report the bug in zsh, until I read what the POSIX specification
says about field splitting [1].

If we ignore all the complexity regarding IFS white spaces (since our
IFS doesn't have them), we arrive to this item:

3.b. Each occurrence in the input of an IFS character that is not
IFS white space, along with any adjacent IFS white space, shall
delimit a field, as described previously.

Again, we ignore the white space stuff, which means "each occurrence
in the input of an IFS character shall delimit a field". So if *each
occurrence* of a comma shall delimit a field, the last comma should
delimit a field. We have four commas, therefore we should have five
fields.

This is not what bash does.

Shouldn't bash generate the last field? At least in POSIX mode (I
tried with `--posix` same output).

Cheers.

Obligatory stuff:

* version: 5.1.16(1)-release
* platform: x86_64 Arch Linux
* compiler: gcc 12.2.1

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Greg Wooledge
On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> IFS=,
> str='foo,bar,,roo,'
> printf '"%s"\n' $str
> 
> There is a discrepancy between how this is interpreted between bash
> and zsh: in bash the last comma doesn't generate a field and is
> ignored,

... which is correct according to POSIX (but not sensible).

> in zsh a last empty field is generated. Initially I was going
> to report the bug in zsh, until I read what the POSIX specification
> says about field splitting [1].

You seem to have misinterpreted whatever you read.

https://mywiki.wooledge.org/BashPitfalls#pf47

Unbelievable as it may seem, POSIX requires the treatment of IFS as
a field terminator, rather than a field separator. What this means
in our example is that if there's an empty field at the end of the
input line, it will be discarded:

$ IFS=, read -ra fields <<< "a,b,"
$ declare -p fields
declare -a fields='([0]="a" [1]="b")'

Where did the empty field go? It was eaten for historical reasons
("because it's always been that way"). This behavior is not unique
to bash; all conformant shells do it.



IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
Hi,

Consider this example:

IFS=,
str='foo,bar,,roo,'
printf '"%s"\n' $str

There is a discrepancy between how this is interpreted between bash
and zsh: in bash the last comma doesn't generate a field and is
ignored, in zsh a last empty field is generated. Initially I was going
to report the bug in zsh, until I read what the POSIX specification
says about field splitting [1].

If we ignore all the complexity regarding IFS white spaces (since our
IFS doesn't have them), we arrive to this item:

3.b. Each occurrence in the input of an IFS character that is not
IFS white space, along with any adjacent IFS white space, shall
delimit a field, as described previously.

Again, we ignore the white space stuff, which means "each occurrence
in the input of an IFS character shall delimit a field". So if *each
occurrence* of a comma shall delimit a field, the last comma should
delimit a field. We have four commas, therefore we should have five
fields.

This is not what bash does.

Shouldn't bash generate the last field? At least in POSIX mode (I
tried with `--posix` same output).

Cheers.

Obligatory stuff:

* version: 5.1.16(1)-release
* platform: x86_64 Arch Linux
* compiler: gcc 12.2.1

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

-- 
Felipe Contreras