Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Andreas Schwab
On Mär 30 2023, Felipe Contreras wrote:

> On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
>  wrote:
>>
>> On 3/30/23 2:12 PM, Felipe Contreras wrote:
>> >  IFS=,
>> >  str='foo,bar,,roo,'
>> >  printf '"%s"\n' $str
>> zsh is the only shell that generates an empty last field, no other shell
>> exhibits this behavior.
>
> So? This is argumentum ad populum. The fact that most shells do X
> doesn't imply that POSIX says X.
>
> It could very well mean that all shells are implementing POSIX wrong.
> Except zsh.

Note that zsh by default is not a POSIX shell, and even in sh
compatibilty mode it doesn't strive to be POSIX compliant.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Lawrence Velázquez
On Thu, Mar 30, 2023, at 2:25 PM, Felipe Contreras wrote:
> On Thu, Mar 30, 2023 at 11:48 AM Oğuz İsmail Uysal
>  wrote:
>>
>> On 3/30/23 7:51 PM, Felipe Contreras wrote:
>> > So? This is argumentum ad populum. The fact that most shells do X
>> > doesn't imply that POSIX says X.
>
>> POSIX documents existing practice.
>
> Your definition of what a standard is and mine are very different then.

The Austin Group itself largely disagrees with your position.


> In my view if POSIX was merely descriptive, then the Austin Group
> would have no need to discuss much, as it's fairly easy to describe
> what current shells do.

Composing technical specifications that describe implementations'
shared behaviors while allowing for their idiosyncrasies is more
involved than you seem to think.


> The challenge is in deciding what they *should* do, which is not
> descriptive, but prescriptive.

The Austin Group does not see its role as prescriptive, although
during discussions implementers are often open to modifying their
implementations to achieve consensus.  If many implementers agree
to make a change, the result may appear prescriptive.  (A recent
example is .)


>> If what it says differs from what the majority of shells do, then it's
>> POSIX that is wrong.
>
> Then there is no point in looking at the standard, since we know what
> it should say

The standard is a reference that describes a set of broadly common
behaviors.  Not everyone is interested in researching and testing
an assortment of implementations whenever they want to determine
whether a behavior is portable.

(Also: bash, dash, ksh, and zsh are not the only shells that exist.)


> and there's no point in discussing about what it does actually say.

You miss every shot you don't take.

https://www.opengroup.org/austin/lists.html


-- 
vq



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Kerin Millar
On Thu, 30 Mar 2023 11:52:06 -0600
Felipe Contreras  wrote:

> Chet wrote:
> > Alternately, you can think of the NUL at the end of the string as an
> > additional field terminator,
> 
> Except if you do that, then 'a,' has two fields since the end of the
> string is an additional field terminator, as I explained.
> 
> > but one that follows the adjacency rules and doesn't create any empty
> > fields.
> 
> So it's a *very special* field terminator that is mentioned nowhere in
> the POSIX specification.

I can only suggest issuing a formal request for clarification. Clearly, there 
exists a prevailing consenus across implementations (bash included). For the 
matter not to be broached by the specification - at least, not by my reading - 
seems irregular.

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 11:48 AM Oğuz İsmail Uysal
 wrote:
>
> On 3/30/23 7:51 PM, Felipe Contreras wrote:
> > So? This is argumentum ad populum. The fact that most shells do X
> > doesn't imply that POSIX says X.

> POSIX documents existing practice.

Your definition of what a standard is and mine are very different then.

In my view if POSIX was merely descriptive, then the Austin Group
would have no need to discuss much, as it's fairly easy to describe
what current shells do.

The challenge is in deciding what they *should* do, which is not
descriptive, but prescriptive. That requires much more consideration.

> If what it says differs from what the majority of shells do, then it's
> POSIX that is wrong.

Then there is no point in looking at the standard, since we know what
it should say, and there's no point in discussing about what it does
actually say.

> > Yes. 'foo,bar,' has two terminators, and therefore two fields.
> > 'foo,bar,roo' has two terminators and therefore two fields, plus
> > garbage. You want to interpret 'foo' as a field, even though it does
> > not have an an explicit terminator. But that's not specified anywhere
> > in POSIX. POSIX doesn't say what should be done with the text after
> > the last terminator. You could throw it away and still be conforming
> > to POSIX.

> I don't think *to SPLIT using delimiters as field terminators* involves
> leaving any part out.

The purpose of field terminators is to demarcate the termination of a
field, as in end or close, which is they are not used to split a
string, they are used to join fields in a way that ensures they are
complete.

If you see data like "Name:Peter;Age:35;Balance:30" you don't go and
conclude the last field ended in 30, especially if you are Peter.

If you don't care about the termination of a field, then there's no
point in using field terminators.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Greg Wooledge
On Thu, Mar 30, 2023 at 11:52:06AM -0600, Felipe Contreras wrote:
> Not to mention the small detail that the Internal Field Separator is
> not a *separator*, but a terminator (with certain exceptions).

POSIX itself admits that the name is confusing.  From sh(1posix):

RATIONALE
   [...]
   The  name  IFS  was originally an abbreviation of ``Input Field Separa‐
   tors''; however, this name is misleading as the IFS characters are  ac‐
   tually  used  as  field terminators.



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 11:22 AM Kerin Millar  wrote:
>
> On Thu, 30 Mar 2023 07:51:59 -0600
> Felipe Contreras  wrote:
>
> > On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
> > >
> > > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > > > IFS=,
> > > > str='foo,bar,,roo,'
> > > > printf '"%s"\n' $str
> > > >
> > > > There is a discrepancy between how this is interpreted between bash
> > > > and zsh: in bash the last comma doesn't generate a field and is
> > > > ignored,
> > >
> > > ... which is correct according to POSIX (but not sensible).
> > >
> > > > in zsh a last empty field is generated. Initially I was going
> > > > to report the bug in zsh, until I read what the POSIX specification
> > > > says about field splitting [1].
> > >
> > > You seem to have misinterpreted whatever you read.
> > >
> > > https://mywiki.wooledge.org/BashPitfalls#pf47
> > >
> > > Unbelievable as it may seem, POSIX requires the treatment of IFS as
> > > a field terminator, rather than a field separator. What this means
> > > in our example is that if there's an empty field at the end of the
> > > input line, it will be discarded:
> > >
> > > $ IFS=, read -ra fields <<< "a,b,"
> > > $ declare -p fields
> > > declare -a fields='([0]="a" [1]="b")'
> > >
> > > Where did the empty field go? It was eaten for historical reasons
> > > ("because it's always been that way"). This behavior is not unique
> > > to bash; all conformant shells do it.
> >
> > If you think in terms of terminators instead of separators, then the
> > above code makes sense because if you add ',' at the end of each field
> > (terminate it), you get the original string:
> >
> > printf '%s,' ${fields[@]}
> >
> > But you can't replicate 'a,b' that way, because b does not have a
> > terminator. Obviously we'll want 'b' as a field, therefore one has to
> > assume either 1) the end of the string is considered an implicit
> > terminator, or 2) the terminator in the last field is optional.
> > Neither of these two things is specified in POSIX.
> >
> > If we consider 1) the end of the string is considered an implicit
> > terminator, then 'a' contains a valid field, but then 'a,' contains
> > *two* fields. Making these terminators indistinguishable from
> > separators.
> >
> > We can go for 2) of course, but this is not specified anywhere in
> > POSIX, that's just common practice.
>
> You may find these interesting; the second link in particular.

Indeed.

> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html

This says precisely what I said in 1):

Chet wrote:
> Alternately, you can think of the NUL at the end of the string as an
> additional field terminator,

Except if you do that, then 'a,' has two fields since the end of the
string is an additional field terminator, as I explained.

> but one that follows the adjacency rules and doesn't create any empty
> fields.

So it's a *very special* field terminator that is mentioned nowhere in
the POSIX specification.

> - http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html
>
> Though I was aware of these behaviours, I do find the POSIX wording to be 
> unclear as concerns the observations made by the second link, to say the 
> least.

So I'm not the only one who thinks it's unclear.

Not to mention the small detail that the Internal Field Separator is
not a *separator*, but a terminator (with certain exceptions).

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Emanuele Torre
On Thu, Mar 30, 2023 at 11:35:08AM -0600, Felipe Contreras wrote:
> > How can you say that the current implementation that bash, dash, etc.
> > use is not compliant to the POSIX specification?
> 
> I have never said that.

The title of this thread is "IFS field splitting doesn't conform with
POSIX".

 emanuele6



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Oğuz İsmail Uysal

On 3/30/23 7:51 PM, Felipe Contreras wrote:
So? This is argumentum ad populum. The fact that most shells do X 
doesn't imply that POSIX says X. 
POSIX documents existing practice. If what it says differs from what the 
majority of shells do, then it's POSIX that is wrong. And this mailing 
list is not the right place to complain about it.


Yes. 'foo,bar,' has two terminators, and therefore two fields. 
'foo,bar,roo' has two terminators and therefore two fields, plus 
garbage. You want to interpret 'foo' as a field, even though it does 
not have an an explicit terminator. But that's not specified anywhere 
in POSIX. POSIX doesn't say what should be done with the text after 
the last terminator. You could throw it away and still be conforming 
to POSIX. 
I don't think *to SPLIT using delimiters as field terminators* involves 
leaving any part out.




Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 9:52 AM Emanuele Torre  wrote:
>
> On Thu, Mar 30, 2023 at 07:51:59AM -0600, Felipe Contreras wrote:
> > But you can't replicate 'a,b' that way, because b does not have a
> > terminator. Obviously we'll want 'b' as a field, therefore one has to
> > assume either 1) the end of the string is considered an implicit
> > terminator, or 2) the terminator in the last field is optional.
> > Neither of these two things is specified in POSIX.
> >
> > If we consider 1) the end of the string is considered an implicit
> > terminator, then 'a' contains a valid field, but then 'a,' contains
> > *two* fields. Making these terminators indistinguishable from
> > separators.
>
> I repeatedly disputed this interpretation on IRC by saying that your
> reasoning to come to this conclusion is that "',' can terminate a field,
> and the end of the string can terminate a field, so ',' at the end is
> two terminators".

I did not come to a conclusion, and that is not my reasoning. In IRC
you never paid attention to what I was actually saying, so here you
are attacking a straw man.

> If we extend that reasoning 'a , b' with IFS=' ,' should be split into
> four fields because individually ' ', ',', ' ', and the end of string
> could all terminate a field.

IFS white space characters shall be interpreted differently. That's
clear from the specification.

> You refuse to acknowledge that it does not make sense to claim that a
> comma at the of the string MUST yield an empty last field just because a
> ',' and the "end of string" terminator individually can terminate a
> field.

That is not my claim.

> The correct interpretation is that a field is implicitly terminated by
> the end of the string if it is not explicitly terminated by a
> terminator.

Nowhere in the specification does it say that.

> How can you say that the current implementation that bash, dash, etc.
> use is not compliant to the POSIX specification?

I have never said that.

> If that is not what you are claiming, how do you think that bash's
> implementation of field splitting is not compatible with POSIX
> definition since you did not mention it as a possible interpretations?

I did not say I think that.

My suggestion is that you forget the IRC discussion and focus on what
is being said here.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Andreas Kusalananda Kähäri
On Thu, Mar 30, 2023 at 10:51:58AM -0600, Felipe Contreras wrote:
> On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
>  wrote:
> >
> > On 3/30/23 2:12 PM, Felipe Contreras wrote:
> > >  IFS=,
> > >  str='foo,bar,,roo,'
> > >  printf '"%s"\n' $str
> > zsh is the only shell that generates an empty last field, no other shell
> > exhibits this behavior.
> 
> So? This is argumentum ad populum. The fact that most shells do X
> doesn't imply that POSIX says X.
> 
> It could very well mean that all shells are implementing POSIX wrong.
> Except zsh.

Without getting into this *specific* issue: That's not how POSIX works.
POSIX standardises existing practices.


Cheers,
A

> Or it could mean POSIX doesn't specify which behavior is correct.
> 
> > Besides your link says:
> >  >The shell shall treat each character of the IFS as a delimiter and use
> > the delimiters as *field >terminators* to split the results of parameter
> > expansion, command substitution, and arithmetic >expansion into fields.
> >
> > So the delimiters terminate fields, not separate them.
> 
> Yes. 'foo,bar,' has two terminators, and therefore two fields.
> 'foo,bar,roo' has two terminators and therefore two fields, plus
> garbage.
> 
> You want to interpret 'foo' as a field, even though it does not have
> an an explicit terminator. But that's not specified anywhere in POSIX.
> 
> POSIX doesn't say what should be done with the text after the last
> terminator. You could throw it away and still be conforming to POSIX.
> 
> -- 
> Felipe Contreras

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Kerin Millar
On Thu, 30 Mar 2023 07:51:59 -0600
Felipe Contreras  wrote:

> On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
> >
> > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > > IFS=,
> > > str='foo,bar,,roo,'
> > > printf '"%s"\n' $str
> > >
> > > There is a discrepancy between how this is interpreted between bash
> > > and zsh: in bash the last comma doesn't generate a field and is
> > > ignored,
> >
> > ... which is correct according to POSIX (but not sensible).
> >
> > > in zsh a last empty field is generated. Initially I was going
> > > to report the bug in zsh, until I read what the POSIX specification
> > > says about field splitting [1].
> >
> > You seem to have misinterpreted whatever you read.
> >
> > https://mywiki.wooledge.org/BashPitfalls#pf47
> >
> > Unbelievable as it may seem, POSIX requires the treatment of IFS as
> > a field terminator, rather than a field separator. What this means
> > in our example is that if there's an empty field at the end of the
> > input line, it will be discarded:
> >
> > $ IFS=, read -ra fields <<< "a,b,"
> > $ declare -p fields
> > declare -a fields='([0]="a" [1]="b")'
> >
> > Where did the empty field go? It was eaten for historical reasons
> > ("because it's always been that way"). This behavior is not unique
> > to bash; all conformant shells do it.
> 
> If you think in terms of terminators instead of separators, then the
> above code makes sense because if you add ',' at the end of each field
> (terminate it), you get the original string:
> 
> printf '%s,' ${fields[@]}
> 
> But you can't replicate 'a,b' that way, because b does not have a
> terminator. Obviously we'll want 'b' as a field, therefore one has to
> assume either 1) the end of the string is considered an implicit
> terminator, or 2) the terminator in the last field is optional.
> Neither of these two things is specified in POSIX.
> 
> If we consider 1) the end of the string is considered an implicit
> terminator, then 'a' contains a valid field, but then 'a,' contains
> *two* fields. Making these terminators indistinguishable from
> separators.
> 
> We can go for 2) of course, but this is not specified anywhere in
> POSIX, that's just common practice.

You may find these interesting; the second link in particular.

- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html
- http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html

Though I was aware of these behaviours, I do find the POSIX wording to be 
unclear as concerns the observations made by the second link, to say the least. 
I would add that it is possible to have it both ways, so to speak, though the 
means of going about it are no less confusing than the topic at large.

$ IFS=,
$ str="a,b"
$ arr=($str""); declare -p arr
declare -a arr=([0]="a" [1]="b")
$ str="a,b,"
$ arr=($str""); declare -p arr # duly coercing an empty field that some may 
expect or wish for
declare -a arr=([0]="a" [1]="b" [2]="")

-- 
Kerin Millar



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 10:10 AM Oğuz İsmail Uysal
 wrote:
>
> On 3/30/23 2:12 PM, Felipe Contreras wrote:
> >  IFS=,
> >  str='foo,bar,,roo,'
> >  printf '"%s"\n' $str
> zsh is the only shell that generates an empty last field, no other shell
> exhibits this behavior.

So? This is argumentum ad populum. The fact that most shells do X
doesn't imply that POSIX says X.

It could very well mean that all shells are implementing POSIX wrong.
Except zsh.

Or it could mean POSIX doesn't specify which behavior is correct.

> Besides your link says:
>  >The shell shall treat each character of the IFS as a delimiter and use
> the delimiters as *field >terminators* to split the results of parameter
> expansion, command substitution, and arithmetic >expansion into fields.
>
> So the delimiters terminate fields, not separate them.

Yes. 'foo,bar,' has two terminators, and therefore two fields.
'foo,bar,roo' has two terminators and therefore two fields, plus
garbage.

You want to interpret 'foo' as a field, even though it does not have
an an explicit terminator. But that's not specified anywhere in POSIX.

POSIX doesn't say what should be done with the text after the last
terminator. You could throw it away and still be conforming to POSIX.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Oğuz İsmail Uysal

On 3/30/23 2:12 PM, Felipe Contreras wrote:

 IFS=,
 str='foo,bar,,roo,'
 printf '"%s"\n' $str
zsh is the only shell that generates an empty last field, no other shell 
exhibits this behavior.


Besides your link says:
>The shell shall treat each character of the IFS as a delimiter and use 
the delimiters as *field >terminators* to split the results of parameter 
expansion, command substitution, and arithmetic >expansion into fields.


So the delimiters terminate fields, not separate them.




Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Emanuele Torre
On Thu, Mar 30, 2023 at 07:51:59AM -0600, Felipe Contreras wrote:
> But you can't replicate 'a,b' that way, because b does not have a
> terminator. Obviously we'll want 'b' as a field, therefore one has to
> assume either 1) the end of the string is considered an implicit
> terminator, or 2) the terminator in the last field is optional.
> Neither of these two things is specified in POSIX.
> 
> If we consider 1) the end of the string is considered an implicit
> terminator, then 'a' contains a valid field, but then 'a,' contains
> *two* fields. Making these terminators indistinguishable from
> separators.

I repeatedly disputed this interpretation on IRC by saying that your
reasoning to come to this conclusion is that "',' can terminate a field,
and the end of the string can terminate a field, so ',' at the end is
two terminators".

If we extend that reasoning 'a , b' with IFS=' ,' should be split into
four fields because individually ' ', ',', ' ', and the end of string
could all terminate a field.

That is obviously not the case because POSIX clearly says that a field
is terminated by the longest match for either a single non-IFS
whitespace character in IFS, and all the IFS-whitespace characters in
IFS around it if any; or a non-zero-length sequence of IFS-whitespace
characters in IFS. So ' , ' is a single terminator.

You refuse to acknowledge that it does not make sense to claim that a
comma at the of the string MUST yield an empty last field just because a
',' and the "end of string" terminator individually can terminate a
field.

The correct interpretation is that a field is implicitly terminated by
the end of the string if it is not explicitly terminated by a
terminator.
Even though this interpretation being repeatedly proposed to you, you
do not even mention it here as a possible interpretation of the
specification. You still insist that the specification can only possibly
be interpreted in the two ways you mentioned.

How can you say that the current implementation that bash, dash, etc.
use is not compliant to the POSIX specification?

And why do you not acknowledge that the logic on which you base your
claim "',' can terminate a field individually and end-of-string can
terminate a field individually, so two of them in a row must have an
empty field between them, and this negates the possibility that at the
end of the string can be considered a single terminator" is flawed?

If that is not what you are claiming, how do you think that bash's
implementation of field splitting is not compatible with POSIX
definition since you did not mention it as a possible interpretations?

 emanuele6



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge  wrote:
>
> On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > IFS=,
> > str='foo,bar,,roo,'
> > printf '"%s"\n' $str
> >
> > There is a discrepancy between how this is interpreted between bash
> > and zsh: in bash the last comma doesn't generate a field and is
> > ignored,
>
> ... which is correct according to POSIX (but not sensible).
>
> > in zsh a last empty field is generated. Initially I was going
> > to report the bug in zsh, until I read what the POSIX specification
> > says about field splitting [1].
>
> You seem to have misinterpreted whatever you read.
>
> https://mywiki.wooledge.org/BashPitfalls#pf47
>
> Unbelievable as it may seem, POSIX requires the treatment of IFS as
> a field terminator, rather than a field separator. What this means
> in our example is that if there's an empty field at the end of the
> input line, it will be discarded:
>
> $ IFS=, read -ra fields <<< "a,b,"
> $ declare -p fields
> declare -a fields='([0]="a" [1]="b")'
>
> Where did the empty field go? It was eaten for historical reasons
> ("because it's always been that way"). This behavior is not unique
> to bash; all conformant shells do it.

If you think in terms of terminators instead of separators, then the
above code makes sense because if you add ',' at the end of each field
(terminate it), you get the original string:

printf '%s,' ${fields[@]}

But you can't replicate 'a,b' that way, because b does not have a
terminator. Obviously we'll want 'b' as a field, therefore one has to
assume either 1) the end of the string is considered an implicit
terminator, or 2) the terminator in the last field is optional.
Neither of these two things is specified in POSIX.

If we consider 1) the end of the string is considered an implicit
terminator, then 'a' contains a valid field, but then 'a,' contains
*two* fields. Making these terminators indistinguishable from
separators.

We can go for 2) of course, but this is not specified anywhere in
POSIX, that's just common practice.

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread alex xmb ratchev
how spooky , cant get read / mapfile to separate right
very sad

On Thu, Mar 30, 2023, 15:19 Felipe Contreras 
wrote:

> Hi,
>
> Consider this example:
>
> IFS=,
> str='foo,bar,,roo,'
> printf '"%s"\n' $str
>
> There is a discrepancy between how this is interpreted between bash
> and zsh: in bash the last comma doesn't generate a field and is
> ignored, in zsh a last empty field is generated. Initially I was going
> to report the bug in zsh, until I read what the POSIX specification
> says about field splitting [1].
>
> If we ignore all the complexity regarding IFS white spaces (since our
> IFS doesn't have them), we arrive to this item:
>
> 3.b. Each occurrence in the input of an IFS character that is not
> IFS white space, along with any adjacent IFS white space, shall
> delimit a field, as described previously.
>
> Again, we ignore the white space stuff, which means "each occurrence
> in the input of an IFS character shall delimit a field". So if *each
> occurrence* of a comma shall delimit a field, the last comma should
> delimit a field. We have four commas, therefore we should have five
> fields.
>
> This is not what bash does.
>
> Shouldn't bash generate the last field? At least in POSIX mode (I
> tried with `--posix` same output).
>
> Cheers.
>
> Obligatory stuff:
>
> * version: 5.1.16(1)-release
> * platform: x86_64 Arch Linux
> * compiler: gcc 12.2.1
>
> [1]
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05
>
> --
> Felipe Contreras
>
>


IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
Hi,

Consider this example:

IFS=,
str='foo,bar,,roo,'
printf '"%s"\n' $str

There is a discrepancy between how this is interpreted between bash
and zsh: in bash the last comma doesn't generate a field and is
ignored, in zsh a last empty field is generated. Initially I was going
to report the bug in zsh, until I read what the POSIX specification
says about field splitting [1].

If we ignore all the complexity regarding IFS white spaces (since our
IFS doesn't have them), we arrive to this item:

3.b. Each occurrence in the input of an IFS character that is not
IFS white space, along with any adjacent IFS white space, shall
delimit a field, as described previously.

Again, we ignore the white space stuff, which means "each occurrence
in the input of an IFS character shall delimit a field". So if *each
occurrence* of a comma shall delimit a field, the last comma should
delimit a field. We have four commas, therefore we should have five
fields.

This is not what bash does.

Shouldn't bash generate the last field? At least in POSIX mode (I
tried with `--posix` same output).

Cheers.

Obligatory stuff:

* version: 5.1.16(1)-release
* platform: x86_64 Arch Linux
* compiler: gcc 12.2.1

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

-- 
Felipe Contreras



Re: IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Greg Wooledge
On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> IFS=,
> str='foo,bar,,roo,'
> printf '"%s"\n' $str
> 
> There is a discrepancy between how this is interpreted between bash
> and zsh: in bash the last comma doesn't generate a field and is
> ignored,

... which is correct according to POSIX (but not sensible).

> in zsh a last empty field is generated. Initially I was going
> to report the bug in zsh, until I read what the POSIX specification
> says about field splitting [1].

You seem to have misinterpreted whatever you read.

https://mywiki.wooledge.org/BashPitfalls#pf47

Unbelievable as it may seem, POSIX requires the treatment of IFS as
a field terminator, rather than a field separator. What this means
in our example is that if there's an empty field at the end of the
input line, it will be discarded:

$ IFS=, read -ra fields <<< "a,b,"
$ declare -p fields
declare -a fields='([0]="a" [1]="b")'

Where did the empty field go? It was eaten for historical reasons
("because it's always been that way"). This behavior is not unique
to bash; all conformant shells do it.



IFS field splitting doesn't conform with POSIX

2023-03-30 Thread Felipe Contreras
Hi,

Consider this example:

IFS=,
str='foo,bar,,roo,'
printf '"%s"\n' $str

There is a discrepancy between how this is interpreted between bash
and zsh: in bash the last comma doesn't generate a field and is
ignored, in zsh a last empty field is generated. Initially I was going
to report the bug in zsh, until I read what the POSIX specification
says about field splitting [1].

If we ignore all the complexity regarding IFS white spaces (since our
IFS doesn't have them), we arrive to this item:

3.b. Each occurrence in the input of an IFS character that is not
IFS white space, along with any adjacent IFS white space, shall
delimit a field, as described previously.

Again, we ignore the white space stuff, which means "each occurrence
in the input of an IFS character shall delimit a field". So if *each
occurrence* of a comma shall delimit a field, the last comma should
delimit a field. We have four commas, therefore we should have five
fields.

This is not what bash does.

Shouldn't bash generate the last field? At least in POSIX mode (I
tried with `--posix` same output).

Cheers.

Obligatory stuff:

* version: 5.1.16(1)-release
* platform: x86_64 Arch Linux
* compiler: gcc 12.2.1

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

-- 
Felipe Contreras