(Replying to the list because I don't have write access to the bug tracker.)

On Sat, Apr 24, 2021 at 6:22 PM Austin Group Bug Tracker via
austin-group-l at The Open Group <austin-group-l@opengroup.org> wrote:
> $ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
> 1 <>
> 2 <,,>
> 3 <>

This seems rather like an implementation bug. Although mawk and nawk
agree with gawk on how that case should be handled, there really is no
reason for `,,' to be a single field there. And if you replace the
asterisk with its interval equivalent, there is no consensus on how
that should work among existing implementations (the ones I have
access to, at least).

$ echo 'x,,z' | gawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,,>
3 <>
$
$ echo 'x,,z' | mawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <x,,z>
$
$ echo 'x,,z' | nawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <x,,z>
$
$ echo 'x,,z' | busybox awk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print
i, "<"$i">"}'
1 <>
2 <,>
3 <,>
4 <>
5 <>

And the expected output is as follows.

1 <>
2 <,>
3 <,>
4 <>

So either the standard should make the behavior unspecified when `FS'
is an ERE that would match a zero-length string, or implementations
should fix these bugs.

  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Oğuz via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Oğuz via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to