Hi,
It seems that "café" should be sorted before "caff" in Unicode.
https://github.com/jtauber/pyuca
But `sort` does not do so.
$ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort
cafe
caff
café
$ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort
cafe
caff
café
How to make `sort` sort acc
On 9/25/19 10:20 AM, Peng Yu wrote:
Hi,
It seems that "café" should be sorted before "caff" in Unicode.
https://github.com/jtauber/pyuca
But `sort` does not do so.
$ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort
cafe
caff
café
$ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort
cafe
I want to make my `sort` to be machine-independent and always use the
correct Unicode sort order. Is there a way to do so?
I don't know how to check where en_US.UTF-8 comes from. Do you know
how to check it? (I use Mac OS X.)
On 9/25/19, Eric Blake wrote:
> On 9/25/19 10:20 AM, Peng Yu wrote:
>>
Unfortunately, multibyte collation is simply unimplemented in MacOS X, so
there is no alternate locale definition that will fix it. As far as I can
tell this is documented only in the BUGS section of `man wcscoll`:
BUGS
The current implementation of wcscoll() only works in single-byte
LC
On 9/25/19 10:56 AM, Peng Yu wrote:
I want to make my `sort` to be machine-independent and always use the
correct Unicode sort order. Is there a way to do so?
Those two goals are somewhat at odds. The only truly portable
machine-independent sorting is the one guaranteed by POSIX when you use
If python can have pyuca that works across platform, why such thing can not
have at C level?
On Wed, Sep 25, 2019 at 12:24 PM Eric Blake wrote:
> On 9/25/19 10:56 AM, Peng Yu wrote:
> > I want to make my `sort` to be machine-independent and always use the
> > correct Unicode sort order. Is there
On 9/25/19 2:46 PM, Peng Yu wrote:
If python can have pyuca that works across platform, why such thing can not
have at C level?
Please don't top-post on technical lists.
It _can_ happen, but only if someone takes the time to contribute a
patch (in this case, I already suggested that a gnulib
libicu works in that way. There is ucol_strcoll.
http://userguide.icu-project.org/collation/api
https://github.com/unicode-org/icu
But think twice if you want to add libicu as a mandatory dependency of
coreutils. It does works at C level and widely used but it's also quite
heavy.
2019-09-26