Re: sort -c/C and last-resort sorting

2021-07-06 Thread Joerg Schilling via austin-group-l at The Open Group
Robert Elz  wrote:

> Date:Mon, 05 Jul 2021 20:05:20 +0200
> From:"Joerg Schilling via austin-group-l at The Open Group" 
> 
> Message-ID:  <20210705180520.kgbgk%sch...@schily.net>
> 
>   | That would be in conflict with long existing practice
> 
> Apparently not in most versions of sort.

If you call NetBSd "most versions"

it seems that all other sort implementations use a uniform definition
for -S and Netbsd is in conflict with "all other sort implementations", at 
least with Solaris, GNU, FreeBSD, OpenBSD.
 
>   | If you like to disable -s, better use +s
> 
> No, + options don't work in general, and would be even more difficult to
> support in sort because of keeping backward compat with its old key 
> specification syntax.

Given that the historic + usage with sort expects a digit past +, I see no 
conflict. 

> If we need a different option (really need) then we simply use a different
> option (for which we'd simply allow both, the new, and -S).

As mentioned, your proposal for -S is not compatible with most versions of 
sort.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: sort -c/C and last-resort sorting

2021-07-06 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 05 Jul 2021 20:05:20 +0200
From:"Joerg Schilling via austin-group-l at The Open Group" 

Message-ID:  <20210705180520.kgbgk%sch...@schily.net>

  | That would be in conflict with long existing practice

Apparently not in most versions of sort.

  | If you like to disable -s, better use +s

No, + options don't work in general, and would be even more difficult to
support in sort because of keeping backward compat with its old key 
specification syntax.

If we need a different option (really need) then we simply use a different
option (for which we'd simply allow both, the new, and -S).

kre




Re: sort -c/C and last-resort sorting

2021-07-06 Thread Geoff Clare via austin-group-l at The Open Group
Joerg Schilling wrote, on 06 Jul 2021:
>
> "Geoff Clare via austin-group-l at The Open Group" 
>  wrote:
> 
> > > If you like to disable -s, better use +s
> > 
> > That wouldn't be suitable for standardisation as it doesn't follow
> > syntax guideline 4. The standard would need to use a different letter,
> > maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for
> > "whole line".
> 
> We already have +option in the standard, see the shell

Wherever + is required or allowed to introduce options, it is
explicitly stated to be an exception to the syntax guidelines.
We should not introduce any more such exceptions.

In the case of sort, + is currently allowed to introduce options in
order to allow implementations to continue to support the historical
sort key syntax. However, adding +s would mean + before options would
become mandatory instead of it being optional, so I still think it
would be an inappropriate choice.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-06 Thread Joerg Schilling via austin-group-l at The Open Group
"Geoff Clare via austin-group-l at The Open Group" 
 wrote:

> > If you like to disable -s, better use +s
> 
> That wouldn't be suitable for standardisation as it doesn't follow
> syntax guideline 4. The standard would need to use a different letter,
> maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for
> "whole line".

We already have +option in the standard, see the shell

The nice ting with +s is that it helps to be economical with option letters in 
a time, when there is a high risk to break existing implementations by 
introducing "new" option letters in POSIX.

BTW: The ast implementation and libgetopt from schilytools support +option
in getopt() already...

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: sort -c/C and last-resort sorting

2021-07-06 Thread Geoff Clare via austin-group-l at The Open Group
Joerg Schilling wrote, on 05 Jul 2021:
>
> "Robert Elz via austin-group-l at The Open Group" 
>  wrote:
> 
> > Date:Mon, 05 Jul 2021 18:04:59 +0200
> > From:"Joerg Schilling via austin-group-l at The Open Group" 
> > 
> > Message-ID:  <20210705160459.e40cs%sch...@schily.net>
> > 
> >   | How do you believe is -S related to what -s could probably do?
> > 
> > The -S under discussion is simply !-s (as -s is !-S) - switches between
> > stable sort (-s), using only designated keys to make order decisions,
> > and original sort (-S) fallback to full record comparisons.
> 
> That would be in conflict with long existing practice
> 
> If you like to disable -s, better use +s

That wouldn't be suitable for standardisation as it doesn't follow
syntax guideline 4. The standard would need to use a different letter,
maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for
"whole line".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-05 Thread Joerg Schilling via austin-group-l at The Open Group
"Robert Elz via austin-group-l at The Open Group" 
 wrote:

> Date:Mon, 05 Jul 2021 18:04:59 +0200
> From:"Joerg Schilling via austin-group-l at The Open Group" 
> 
> Message-ID:  <20210705160459.e40cs%sch...@schily.net>
> 
>   | How do you believe is -S related to what -s could probably do?
> 
> The -S under discussion is simply !-s (as -s is !-S) - switches between
> stable sort (-s), using only designated keys to make order decisions,
> and original sort (-S) fallback to full record comparisons.

That would be in conflict with long existing practice

If you like to disable -s, better use +s



Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: sort -c/C and last-resort sorting

2021-07-05 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 05 Jul 2021 18:04:59 +0200
From:"Joerg Schilling via austin-group-l at The Open Group" 

Message-ID:  <20210705160459.e40cs%sch...@schily.net>

  | How do you believe is -S related to what -s could probably do?

The -S under discussion is simply !-s (as -s is !-S) - switches between
stable sort (-s), using only designated keys to make order decisions,
and original sort (-S) fallback to full record comparisons.

kre




Re: sort -c/C and last-resort sorting

2021-07-05 Thread Joerg Schilling via austin-group-l at The Open Group
"Stephane Chazelas via austin-group-l at The Open Group" 
 wrote:

> That's even more justification for adding -s to the standard
> though so people can at least choose to get a stable sort
> portably. -S could probably be added as well, but I don't think
> it wise to make the default behaviour unspecified.

How do you believe is -S related to what -s could probably do?

-S is in use to set up the virtual mamory for sorting since at least 23 years.

Do you have a different meaning in mind?

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: sort -c/C and last-resort sorting

2021-07-05 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 5 Jul 2021 09:33:32 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20210705083332.GA21700@localhost>

  | If we add both -s and -S and specify "last one wins",

That's what the NetBSD implementation does.

kre



Re: sort -c/C and last-resort sorting

2021-07-05 Thread Geoff Clare via austin-group-l at The Open Group
Stephane Chazelas wrote, on 04 Jul 2021:
>
> That's even more justification for adding -s to the standard
> though so people can at least choose to get a stable sort
> portably. -S could probably be added as well, but I don't think
> it wise to make the default behaviour unspecified.

If we add both -s and -S and specify "last one wins", then users
can set their own default by creating a wrapper script that
inserts either -s or -S as the first argument.  So I wouldn't have
a problem making the default unspecified - although it might be
prudent to postpone that change to the revision after the one that
adds -s and -S.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 4 Jul 2021 10:31:06 +0100
From:Stephane Chazelas 
Message-ID:  <20210704093106.2ce2cyg77f2nm...@chazelas.org>


  | That would make is non-compliant then.

s/is/it/ ... and yes, so?

  | SUS> When there are multiple key fields, later keys shall be

There was no need to quote that, I'm fully aware of how it is specified,
which I'm also not complaining about - this is a case where that actually
is (or was) the standard, as it is how sort was implemented, from long
long ago (way back before -k was invented and we just had + and -).

It's just stupid.

This is a case where systems need to simply start ignoring the standard
and doing the sane thing, so that the standard can eventually be updated
to also be sane.   That's how evolution happens.

  | I don't know what the original rationale was, but /one/
  | rationale could be to garantee a deterministic and total order,

No question but that there needs to be a method to achieve that, but
it does not need to be an undefeatable default.  It doesn't need to be
the default at all.  The default should be the most useful behaviour,
which is probably not that (its only real merit is for compat with the
ancient past).

  | to make sure that two files with the same lines (though in
  | different orders) result in the same output when sorted whatever
  | the sorting specification.

Nor is that final qualification needed.   The output order is defined
by the sorting specification, if one wants some particular order, one
must write the spec to achieve that order.  A different sorting specification
both can, and should, result in a different order.

kre



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group:
[...]
> which is the way it should be - if one has taken the trouble to specify
> what parts of the record are the keys for sorting (and -u comparisons)
> then sort should not be gratuitously adding more - that it used to do so
> was widely regarded as a bug (especially given that there was no way to
> defeat it, but enabling it is so simple when it is not the default).
> 
> Or, if one simply wants the useless posix behaviour, -S requests that
[...]
> should achieve that.   Setting POSIXLY_CORRECT in the environ probably
> should as well, but doesn't currently I believe (and no-one is complaining
> that it doesn't).
[...]

While that view may have some merit, I'm not convinced that it
would be enough to justify deviating from all other
implementations and from the standard.

I'd imagine backward compatibility would have been broken at
some point for that as NetBSD sort used to be the GNU one like
on most BSDs, so I suspect that's a strongly held view there.

That's even more justification for adding -s to the standard
though so people can at least choose to get a stable sort
portably. -S could probably be added as well, but I don't think
it wise to make the default behaviour unspecified.

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group:
> Date:Fri, 2 Jul 2021 14:41:50 +0100
> From:"Geoff Clare via austin-group-l at The Open Group" 
> 
> Message-ID:  <20210702134150.GB16587@localhost>
> 
>   | I've always assumed that the intention for -c is to answer the
>   | question "if I ran this command without -c would the output be 
>   | the same as the input?"  So the NetBSD behaviour seems wrong
>   | to me.
> 
> But:
>   jinx$ printf '%s\n' a,b a,a 
>   a,b
>   a,a
>   jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1
>   a,b
>   a,a

That would make is non-compliant then.

SUS> When there are multiple key fields, later keys shall be
SUS> compared only after all earlier keys compare equal. Except
SUS> when the -u option is specified, lines that otherwise
  
SUS> compare equal shall be ordered as if none of the options
 
SUS> -d, -f, -i, -n, or -k were present (but with -r still in
 
SUS> effect, if it was specified) and with all bytes in the
 ^^
SUS> lines significant to the comparison. The order in which
 ^^^
SUS> lines that still compare equal are written is unspecified.

[...]
> ie: When -k args are given, there is no fallback to "whole record" matching,
> if one wants that, one can easily add a final -k1 option to make that happen:
[...]
> which is the way it should be - if one has taken the trouble to specify
> what parts of the record are the keys for sorting (and -u comparisons)
> then sort should not be gratuitously adding more - that it used to do so
> was widely regarded as a bug (especially given that there was no way to
> defeat it, but enabling it is so simple when it is not the default).
[...]

I don't know what the original rationale was, but /one/
rationale could be to garantee a deterministic and total order,
to make sure that two files with the same lines (though in
different orders) result in the same output when sorted whatever
the sorting specification.

That guarantee is broken in locales that don't have total order
which was the subject of recent changes.

POSIX sort does sort as specified, and in cases where the user
doesn't say (sort key same but line different), makes one of
several possible decisions, in that case last resort comparison
of the full line (and resort to memcmp() comparison when
strcoll() find them equal if need be), whilst NetBSD sort uses
the original order. Note that POSIX doesn't require the order be
stable, leaves it unspecified what the selected one is for sort
-uk1,1 for instance.

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-04 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 2 Jul 2021 14:41:50 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20210702134150.GB16587@localhost>

  | I've always assumed that the intention for -c is to answer the
  | question "if I ran this command without -c would the output be 
  | the same as the input?"  So the NetBSD behaviour seems wrong
  | to me.

But:
jinx$ printf '%s\n' a,b a,a 
a,b
a,a
jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1
a,b
a,a
jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1; echo $?
0

the output (without -c) does match the input, so it seems right to me.

Note that on NetBSD, the -s option that has been discussed here exists,
but does nothing, as it is the default.

ie: When -k args are given, there is no fallback to "whole record" matching,
if one wants that, one can easily add a final -k1 option to make that happen:

jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 -k1
a,a
a,b
jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1 -k1
sort: found disorder: a,a

which is the way it should be - if one has taken the trouble to specify
what parts of the record are the keys for sorting (and -u comparisons)
then sort should not be gratuitously adding more - that it used to do so
was widely regarded as a bug (especially given that there was no way to
defeat it, but enabling it is so simple when it is not the default).

Or, if one simply wants the useless posix behaviour, -S requests that

jinx$ printf '%s\n' a,b a,a | sort -S -t, -k1,1
a,a
a,b
jinx$ printf '%s\n' a,b a,a | sort -c -S -t, -k1,1
sort: found disorder: a,a

I doubt that option is much used however, as it is so counter-intuitive,
but if it was wanted all the time

sort() { command sort -S "$@"; }

should achieve that.   Setting POSIXLY_CORRECT in the environ probably
should as well, but doesn't currently I believe (and no-one is complaining
that it doesn't).

kre




Re: sort -c/C and last-resort sorting

2021-07-02 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-02 15:54:48 +0100, Geoff Clare via austin-group-l at The Open Group:
> Joerg Schilling wrote, on 02 Jul 2021:
> >
> > > > > sort: -:2: disorder: a,a
> > > > 
> > > > Try to use the POSIX sort variant to avoid the message.
> > > [...]
> > > 
> > > I suppose you mean the -C option, which
> > > still checks but doesn't output a diagnostics message.
> > 
> > No, I was referring to /usr/xpg4/bin/sort
> 
> That no longer exists in Solaris.  If Illumos still has it they
> should probably remove it (or make it a symlink to /usr/bin/sort).
[...]

Illumos sort doesn't seem to be suporting -C yet:
https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/sort/common/options.c#L76
(code not changed since 2005).

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Joerg Schilling via austin-group-l at The Open Group
"Geoff Clare via austin-group-l at The Open Group" 
 wrote:

> > No, I was referring to /usr/xpg4/bin/sort
> 
> That no longer exists in Solaris.  If Illumos still has it they
> should probably remove it (or make it a symlink to /usr/bin/sort).

OK, I checked the source and the only difference between both versions is te 
missing warning in /usr/xpg4/bin/sort.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Geoff Clare via austin-group-l at The Open Group
Joerg Schilling wrote, on 02 Jul 2021:
>
> > > > sort: -:2: disorder: a,a
> > > 
> > > Try to use the POSIX sort variant to avoid the message.
> > [...]
> > 
> > I suppose you mean the -C option, which
> > still checks but doesn't output a diagnostics message.
> 
> No, I was referring to /usr/xpg4/bin/sort

That no longer exists in Solaris.  If Illumos still has it they
should probably remove it (or make it a symlink to /usr/bin/sort).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Joerg Schilling via austin-group-l at The Open Group
Stephane Chazelas  wrote:

> 2021-07-02 14:07:17 +0200, Joerg Schilling via austin-group-l at The Open 
> Group:
> > "Stephane Chazelas via austin-group-l at The Open Group" 
> >  wrote:
> > 
> > > Is:
> > > 
> > > printf '%s\n' a,b a,a | sort -c -t, -k1,1
> > > 
> > > Meant to succeed or not?
> > > 
> > > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a
> > > confusing:
> > > 
> > > sort: -:2: disorder: a,a
> > 
> > Try to use the POSIX sort variant to avoid the message.
> [...]
> 
> I suppose you mean the -C option, which
> still checks but doesn't output a diagnostics message.

No, I was referring to /usr/xpg4/bin/sort

~A

/



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Geoff Clare via austin-group-l at The Open Group
Stephane Chazelas wrote, on 02 Jul 2021:
>
> btw, it seems to me -C should be referenced in the EXIT STATUS
> section and in the -u description like for -c.

Yes, also in STDOUT.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-07-02 14:07:17 +0200, Joerg Schilling via austin-group-l at The Open Group:
> "Stephane Chazelas via austin-group-l at The Open Group" 
>  wrote:
> 
> > Is:
> > 
> > printf '%s\n' a,b a,a | sort -c -t, -k1,1
> > 
> > Meant to succeed or not?
> > 
> > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a
> > confusing:
> > 
> > sort: -:2: disorder: a,a
> 
> Try to use the POSIX sort variant to avoid the message.
[...]

I suppose you mean the -C option, which
still checks but doesn't output a diagnostics message.

btw, it seems to me -C should be referenced in the EXIT STATUS
section and in the -u description like for -c.

But the question here also stands for -C: should sort return
success or failure when a file is sorted according to the key
specification but not as per the last resort sort, and should -s
be added to the specification.

I'm personally happy with Geoff's answer on those.

-- 
Stephane



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Geoff Clare via austin-group-l at The Open Group
Stephane Chazelas wrote, on 02 Jul 2021:
>
> Is:
> 
> printf '%s\n' a,b a,a | sort -c -t, -k1,1
> 
> Meant to succeed or not?
> 
> It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a
> confusing:
> 
> sort: -:2: disorder: a,a
> 
> diagnostic and succeeds in NetBSD.
> 
> It succeeds with -s in all implementations that support that
> flag (all but Solaris in that list above).
> 
> By my reading, NetBSD is not compliant, as nothing says the
> last-resort whole-line comparison does not apply with -c/-C,
> though with implementations that don't support -s (or with the
> POSIX API that doesn't support -s though references it), it
> seems to be a more useful (and less surprising) interface.

I've always assumed that the intention for -c is to answer the
question "if I ran this command without -c would the output be 
the same as the input?"  So the NetBSD behaviour seems wrong
to me.

> Currently, with Solaris sort, it doesn't look like it's possible
> to check whether a file is already sorted on a given key.
> 
> Should -s be added to POSIX?

Seems like a useful addition - I would be in favour.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: sort -c/C and last-resort sorting

2021-07-02 Thread Joerg Schilling via austin-group-l at The Open Group
"Stephane Chazelas via austin-group-l at The Open Group" 
 wrote:

> Is:
> 
> printf '%s\n' a,b a,a | sort -c -t, -k1,1
> 
> Meant to succeed or not?
> 
> It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a
> confusing:
> 
> sort: -:2: disorder: a,a

Try to use the POSIX sort variant to avoid the message.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



sort -c/C and last-resort sorting

2021-07-02 Thread Stephane Chazelas via austin-group-l at The Open Group
Is:

printf '%s\n' a,b a,a | sort -c -t, -k1,1

Meant to succeed or not?

It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a
confusing:

sort: -:2: disorder: a,a

diagnostic and succeeds in NetBSD.

It succeeds with -s in all implementations that support that
flag (all but Solaris in that list above).

By my reading, NetBSD is not compliant, as nothing says the
last-resort whole-line comparison does not apply with -c/-C,
though with implementations that don't support -s (or with the
POSIX API that doesn't support -s though references it), it
seems to be a more useful (and less surprising) interface.

Currently, with Solaris sort, it doesn't look like it's possible
to check whether a file is already sorted on a given key.

Should -s be added to POSIX?

Any opinion?

-- 
Stephane