Re: sort -c/C and last-resort sorting
Robert Elz wrote: > Date:Mon, 05 Jul 2021 20:05:20 +0200 > From:"Joerg Schilling via austin-group-l at The Open Group" > > Message-ID: <20210705180520.kgbgk%sch...@schily.net> > > | That would be in conflict with long existing practice > > Apparently not in most versions of sort. If you call NetBSd "most versions" it seems that all other sort implementations use a uniform definition for -S and Netbsd is in conflict with "all other sort implementations", at least with Solaris, GNU, FreeBSD, OpenBSD. > | If you like to disable -s, better use +s > > No, + options don't work in general, and would be even more difficult to > support in sort because of keeping backward compat with its old key > specification syntax. Given that the historic + usage with sort expects a digit past +, I see no conflict. > If we need a different option (really need) then we simply use a different > option (for which we'd simply allow both, the new, and -S). As mentioned, your proposal for -S is not compatible with most versions of sort. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: sort -c/C and last-resort sorting
Date:Mon, 05 Jul 2021 20:05:20 +0200 From:"Joerg Schilling via austin-group-l at The Open Group" Message-ID: <20210705180520.kgbgk%sch...@schily.net> | That would be in conflict with long existing practice Apparently not in most versions of sort. | If you like to disable -s, better use +s No, + options don't work in general, and would be even more difficult to support in sort because of keeping backward compat with its old key specification syntax. If we need a different option (really need) then we simply use a different option (for which we'd simply allow both, the new, and -S). kre
Re: sort -c/C and last-resort sorting
Joerg Schilling wrote, on 06 Jul 2021: > > "Geoff Clare via austin-group-l at The Open Group" > wrote: > > > > If you like to disable -s, better use +s > > > > That wouldn't be suitable for standardisation as it doesn't follow > > syntax guideline 4. The standard would need to use a different letter, > > maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for > > "whole line". > > We already have +option in the standard, see the shell Wherever + is required or allowed to introduce options, it is explicitly stated to be an exception to the syntax guidelines. We should not introduce any more such exceptions. In the case of sort, + is currently allowed to introduce options in order to allow implementations to continue to support the historical sort key syntax. However, adding +s would mean + before options would become mandatory instead of it being optional, so I still think it would be an inappropriate choice. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
"Geoff Clare via austin-group-l at The Open Group" wrote: > > If you like to disable -s, better use +s > > That wouldn't be suitable for standardisation as it doesn't follow > syntax guideline 4. The standard would need to use a different letter, > maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for > "whole line". We already have +option in the standard, see the shell The nice ting with +s is that it helps to be economical with option letters in a time, when there is a high risk to break existing implementations by introducing "new" option letters in POSIX. BTW: The ast implementation and libgetopt from schilytools support +option in getopt() already... Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: sort -c/C and last-resort sorting
Joerg Schilling wrote, on 05 Jul 2021: > > "Robert Elz via austin-group-l at The Open Group" > wrote: > > > Date:Mon, 05 Jul 2021 18:04:59 +0200 > > From:"Joerg Schilling via austin-group-l at The Open Group" > > > > Message-ID: <20210705160459.e40cs%sch...@schily.net> > > > > | How do you believe is -S related to what -s could probably do? > > > > The -S under discussion is simply !-s (as -s is !-S) - switches between > > stable sort (-s), using only designated keys to make order decisions, > > and original sort (-S) fallback to full record comparisons. > > That would be in conflict with long existing practice > > If you like to disable -s, better use +s That wouldn't be suitable for standardisation as it doesn't follow syntax guideline 4. The standard would need to use a different letter, maybe -F for "fully sorted", or -l/-L for "last resort", or -w/-W for "whole line". -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
"Robert Elz via austin-group-l at The Open Group" wrote: > Date:Mon, 05 Jul 2021 18:04:59 +0200 > From:"Joerg Schilling via austin-group-l at The Open Group" > > Message-ID: <20210705160459.e40cs%sch...@schily.net> > > | How do you believe is -S related to what -s could probably do? > > The -S under discussion is simply !-s (as -s is !-S) - switches between > stable sort (-s), using only designated keys to make order decisions, > and original sort (-S) fallback to full record comparisons. That would be in conflict with long existing practice If you like to disable -s, better use +s Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: sort -c/C and last-resort sorting
Date:Mon, 05 Jul 2021 18:04:59 +0200 From:"Joerg Schilling via austin-group-l at The Open Group" Message-ID: <20210705160459.e40cs%sch...@schily.net> | How do you believe is -S related to what -s could probably do? The -S under discussion is simply !-s (as -s is !-S) - switches between stable sort (-s), using only designated keys to make order decisions, and original sort (-S) fallback to full record comparisons. kre
Re: sort -c/C and last-resort sorting
"Stephane Chazelas via austin-group-l at The Open Group" wrote: > That's even more justification for adding -s to the standard > though so people can at least choose to get a stable sort > portably. -S could probably be added as well, but I don't think > it wise to make the default behaviour unspecified. How do you believe is -S related to what -s could probably do? -S is in use to set up the virtual mamory for sorting since at least 23 years. Do you have a different meaning in mind? Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: sort -c/C and last-resort sorting
Date:Mon, 5 Jul 2021 09:33:32 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20210705083332.GA21700@localhost> | If we add both -s and -S and specify "last one wins", That's what the NetBSD implementation does. kre
Re: sort -c/C and last-resort sorting
Stephane Chazelas wrote, on 04 Jul 2021: > > That's even more justification for adding -s to the standard > though so people can at least choose to get a stable sort > portably. -S could probably be added as well, but I don't think > it wise to make the default behaviour unspecified. If we add both -s and -S and specify "last one wins", then users can set their own default by creating a wrapper script that inserts either -s or -S as the first argument. So I wouldn't have a problem making the default unspecified - although it might be prudent to postpone that change to the revision after the one that adds -s and -S. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
Date:Sun, 4 Jul 2021 10:31:06 +0100 From:Stephane Chazelas Message-ID: <20210704093106.2ce2cyg77f2nm...@chazelas.org> | That would make is non-compliant then. s/is/it/ ... and yes, so? | SUS> When there are multiple key fields, later keys shall be There was no need to quote that, I'm fully aware of how it is specified, which I'm also not complaining about - this is a case where that actually is (or was) the standard, as it is how sort was implemented, from long long ago (way back before -k was invented and we just had + and -). It's just stupid. This is a case where systems need to simply start ignoring the standard and doing the sane thing, so that the standard can eventually be updated to also be sane. That's how evolution happens. | I don't know what the original rationale was, but /one/ | rationale could be to garantee a deterministic and total order, No question but that there needs to be a method to achieve that, but it does not need to be an undefeatable default. It doesn't need to be the default at all. The default should be the most useful behaviour, which is probably not that (its only real merit is for compat with the ancient past). | to make sure that two files with the same lines (though in | different orders) result in the same output when sorted whatever | the sorting specification. Nor is that final qualification needed. The output order is defined by the sorting specification, if one wants some particular order, one must write the spec to achieve that order. A different sorting specification both can, and should, result in a different order. kre
Re: sort -c/C and last-resort sorting
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group: [...] > which is the way it should be - if one has taken the trouble to specify > what parts of the record are the keys for sorting (and -u comparisons) > then sort should not be gratuitously adding more - that it used to do so > was widely regarded as a bug (especially given that there was no way to > defeat it, but enabling it is so simple when it is not the default). > > Or, if one simply wants the useless posix behaviour, -S requests that [...] > should achieve that. Setting POSIXLY_CORRECT in the environ probably > should as well, but doesn't currently I believe (and no-one is complaining > that it doesn't). [...] While that view may have some merit, I'm not convinced that it would be enough to justify deviating from all other implementations and from the standard. I'd imagine backward compatibility would have been broken at some point for that as NetBSD sort used to be the GNU one like on most BSDs, so I suspect that's a strongly held view there. That's even more justification for adding -s to the standard though so people can at least choose to get a stable sort portably. -S could probably be added as well, but I don't think it wise to make the default behaviour unspecified. -- Stephane
Re: sort -c/C and last-resort sorting
2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group: > Date:Fri, 2 Jul 2021 14:41:50 +0100 > From:"Geoff Clare via austin-group-l at The Open Group" > > Message-ID: <20210702134150.GB16587@localhost> > > | I've always assumed that the intention for -c is to answer the > | question "if I ran this command without -c would the output be > | the same as the input?" So the NetBSD behaviour seems wrong > | to me. > > But: > jinx$ printf '%s\n' a,b a,a > a,b > a,a > jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 > a,b > a,a That would make is non-compliant then. SUS> When there are multiple key fields, later keys shall be SUS> compared only after all earlier keys compare equal. Except SUS> when the -u option is specified, lines that otherwise SUS> compare equal shall be ordered as if none of the options SUS> -d, -f, -i, -n, or -k were present (but with -r still in SUS> effect, if it was specified) and with all bytes in the ^^ SUS> lines significant to the comparison. The order in which ^^^ SUS> lines that still compare equal are written is unspecified. [...] > ie: When -k args are given, there is no fallback to "whole record" matching, > if one wants that, one can easily add a final -k1 option to make that happen: [...] > which is the way it should be - if one has taken the trouble to specify > what parts of the record are the keys for sorting (and -u comparisons) > then sort should not be gratuitously adding more - that it used to do so > was widely regarded as a bug (especially given that there was no way to > defeat it, but enabling it is so simple when it is not the default). [...] I don't know what the original rationale was, but /one/ rationale could be to garantee a deterministic and total order, to make sure that two files with the same lines (though in different orders) result in the same output when sorted whatever the sorting specification. That guarantee is broken in locales that don't have total order which was the subject of recent changes. POSIX sort does sort as specified, and in cases where the user doesn't say (sort key same but line different), makes one of several possible decisions, in that case last resort comparison of the full line (and resort to memcmp() comparison when strcoll() find them equal if need be), whilst NetBSD sort uses the original order. Note that POSIX doesn't require the order be stable, leaves it unspecified what the selected one is for sort -uk1,1 for instance. -- Stephane
Re: sort -c/C and last-resort sorting
Date:Fri, 2 Jul 2021 14:41:50 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20210702134150.GB16587@localhost> | I've always assumed that the intention for -c is to answer the | question "if I ran this command without -c would the output be | the same as the input?" So the NetBSD behaviour seems wrong | to me. But: jinx$ printf '%s\n' a,b a,a a,b a,a jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 a,b a,a jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1; echo $? 0 the output (without -c) does match the input, so it seems right to me. Note that on NetBSD, the -s option that has been discussed here exists, but does nothing, as it is the default. ie: When -k args are given, there is no fallback to "whole record" matching, if one wants that, one can easily add a final -k1 option to make that happen: jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 -k1 a,a a,b jinx$ printf '%s\n' a,b a,a | sort -c -t, -k1,1 -k1 sort: found disorder: a,a which is the way it should be - if one has taken the trouble to specify what parts of the record are the keys for sorting (and -u comparisons) then sort should not be gratuitously adding more - that it used to do so was widely regarded as a bug (especially given that there was no way to defeat it, but enabling it is so simple when it is not the default). Or, if one simply wants the useless posix behaviour, -S requests that jinx$ printf '%s\n' a,b a,a | sort -S -t, -k1,1 a,a a,b jinx$ printf '%s\n' a,b a,a | sort -c -S -t, -k1,1 sort: found disorder: a,a I doubt that option is much used however, as it is so counter-intuitive, but if it was wanted all the time sort() { command sort -S "$@"; } should achieve that. Setting POSIXLY_CORRECT in the environ probably should as well, but doesn't currently I believe (and no-one is complaining that it doesn't). kre
Re: sort -c/C and last-resort sorting
2021-07-02 15:54:48 +0100, Geoff Clare via austin-group-l at The Open Group: > Joerg Schilling wrote, on 02 Jul 2021: > > > > > > > sort: -:2: disorder: a,a > > > > > > > > Try to use the POSIX sort variant to avoid the message. > > > [...] > > > > > > I suppose you mean the -C option, which > > > still checks but doesn't output a diagnostics message. > > > > No, I was referring to /usr/xpg4/bin/sort > > That no longer exists in Solaris. If Illumos still has it they > should probably remove it (or make it a symlink to /usr/bin/sort). [...] Illumos sort doesn't seem to be suporting -C yet: https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/sort/common/options.c#L76 (code not changed since 2005). -- Stephane
Re: sort -c/C and last-resort sorting
"Geoff Clare via austin-group-l at The Open Group" wrote: > > No, I was referring to /usr/xpg4/bin/sort > > That no longer exists in Solaris. If Illumos still has it they > should probably remove it (or make it a symlink to /usr/bin/sort). OK, I checked the source and the only difference between both versions is te missing warning in /usr/xpg4/bin/sort. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: sort -c/C and last-resort sorting
Joerg Schilling wrote, on 02 Jul 2021: > > > > > sort: -:2: disorder: a,a > > > > > > Try to use the POSIX sort variant to avoid the message. > > [...] > > > > I suppose you mean the -C option, which > > still checks but doesn't output a diagnostics message. > > No, I was referring to /usr/xpg4/bin/sort That no longer exists in Solaris. If Illumos still has it they should probably remove it (or make it a symlink to /usr/bin/sort). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
Stephane Chazelas wrote: > 2021-07-02 14:07:17 +0200, Joerg Schilling via austin-group-l at The Open > Group: > > "Stephane Chazelas via austin-group-l at The Open Group" > > wrote: > > > > > Is: > > > > > > printf '%s\n' a,b a,a | sort -c -t, -k1,1 > > > > > > Meant to succeed or not? > > > > > > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a > > > confusing: > > > > > > sort: -:2: disorder: a,a > > > > Try to use the POSIX sort variant to avoid the message. > [...] > > I suppose you mean the -C option, which > still checks but doesn't output a diagnostics message. No, I was referring to /usr/xpg4/bin/sort ~A /
Re: sort -c/C and last-resort sorting
Stephane Chazelas wrote, on 02 Jul 2021: > > btw, it seems to me -C should be referenced in the EXIT STATUS > section and in the -u description like for -c. Yes, also in STDOUT. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
2021-07-02 14:07:17 +0200, Joerg Schilling via austin-group-l at The Open Group: > "Stephane Chazelas via austin-group-l at The Open Group" > wrote: > > > Is: > > > > printf '%s\n' a,b a,a | sort -c -t, -k1,1 > > > > Meant to succeed or not? > > > > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a > > confusing: > > > > sort: -:2: disorder: a,a > > Try to use the POSIX sort variant to avoid the message. [...] I suppose you mean the -C option, which still checks but doesn't output a diagnostics message. btw, it seems to me -C should be referenced in the EXIT STATUS section and in the -u description like for -c. But the question here also stands for -C: should sort return success or failure when a file is sorted according to the key specification but not as per the last resort sort, and should -s be added to the specification. I'm personally happy with Geoff's answer on those. -- Stephane
Re: sort -c/C and last-resort sorting
Stephane Chazelas wrote, on 02 Jul 2021: > > Is: > > printf '%s\n' a,b a,a | sort -c -t, -k1,1 > > Meant to succeed or not? > > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a > confusing: > > sort: -:2: disorder: a,a > > diagnostic and succeeds in NetBSD. > > It succeeds with -s in all implementations that support that > flag (all but Solaris in that list above). > > By my reading, NetBSD is not compliant, as nothing says the > last-resort whole-line comparison does not apply with -c/-C, > though with implementations that don't support -s (or with the > POSIX API that doesn't support -s though references it), it > seems to be a more useful (and less surprising) interface. I've always assumed that the intention for -c is to answer the question "if I ran this command without -c would the output be the same as the input?" So the NetBSD behaviour seems wrong to me. > Currently, with Solaris sort, it doesn't look like it's possible > to check whether a file is already sorted on a given key. > > Should -s be added to POSIX? Seems like a useful addition - I would be in favour. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: sort -c/C and last-resort sorting
"Stephane Chazelas via austin-group-l at The Open Group" wrote: > Is: > > printf '%s\n' a,b a,a | sort -c -t, -k1,1 > > Meant to succeed or not? > > It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a > confusing: > > sort: -:2: disorder: a,a Try to use the POSIX sort variant to avoid the message. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
sort -c/C and last-resort sorting
Is: printf '%s\n' a,b a,a | sort -c -t, -k1,1 Meant to succeed or not? It fails in GNU, busybox, OpenBSD, FreeBSD, Solaris, though with a confusing: sort: -:2: disorder: a,a diagnostic and succeeds in NetBSD. It succeeds with -s in all implementations that support that flag (all but Solaris in that list above). By my reading, NetBSD is not compliant, as nothing says the last-resort whole-line comparison does not apply with -c/-C, though with implementations that don't support -s (or with the POSIX API that doesn't support -s though references it), it seems to be a more useful (and less surprising) interface. Currently, with Solaris sort, it doesn't look like it's possible to check whether a file is already sorted on a given key. Should -s be added to POSIX? Any opinion? -- Stephane