Re: Is Sort:Naturally a Little Off?

2021-09-25 Thread Shawn H Corey

On 2021-09-25 4:29 a.m., Smylers wrote:

So it appears that en_CA.utf8 and en_GB.utf8 sort ‘a’ and ‘A’ the
opposite way round to each other. I wonder why. (Not relevant to any bug
in Sort::Naturally, but it's now intriguing me.)


I tried en_US.utf8 and en_AU.utf8. They both follow en_GB.utf8. 
en_CA.utf8 seems to be the odd one out. Don't know why.




Re: Is Sort:Naturally a Little Off?

2021-09-25 Thread Smylers
Shawn H Corey writes:

> And my environment is:
> 
> $ env|grep LC_|sort
> LC_ADDRESS=en_CA.UTF-8
> LC_IDENTIFICATION=en_CA.UTF-8
> LC_MEASUREMENT=en_CA.UTF-8
> LC_MONETARY=en_CA.UTF-8
> LC_NAME=en_CA.UTF-8
> LC_NUMERIC=en_CA.UTF-8
> LC_PAPER=en_CA.UTF-8
> LC_TELEPHONE=en_CA.UTF-8
> LC_TIME=en_CA.UTF-8

I think that if you don't have LC_COLLATE or LC_ALL set, then the value
of LANG is used, so that could also be relevant here.

I can reproduce your results when I use en_CA.utf8.

So it appears that en_CA.utf8 and en_GB.utf8 sort ‘a’ and ‘A’ the
opposite way round to each other. I wonder why. (Not relevant to any bug
in Sort::Naturally, but it's now intriguing me.)

Smylers




Re: Is Sort:Naturally a Little Off?

2021-09-25 Thread Shawn H Corey
Setting LC_ALL=C gives the Unicode sequence for Perl's sort but 
Sort::Naturally still does not seem correct:


LC_ALL=C ./sort-test.pl
unsorted  : 4 A X i 1 x 10 a B ä y z į C Ä b c Į Y Z än and 
ÄND And Any ant Äm Äs


no locale,  perl  : 1 10 4 A And Any B C X Y Z a and ant b c i x y z 
Ä ÄND Äm Äs ä än Į į
no locale,  naturally : 1 4 10 Ä ä A a And and ant Any B b C c i Äm än 
ÄND Äs X x Y y Z z Į į


use locale, perl  : 1 10 4 A And Any B C X Y Z a and ant b c i x y z 
Ä ÄND Äm Äs ä än Į į
use locale, naturally : 1 4 10 Ä ä A a And and ant Any B b C c i Äm än 
ÄND Äs X x Y y Z z Į į


use locale, perl num  : 1 4 10 A And Any B C X Y Z a and ant b c i x y z 
Ä ÄND Äm Äs ä än Į į




Re: Is Sort:Naturally a Little Off?

2021-09-25 Thread Shawn H Corey

On 2021-09-25 3:01 a.m., Smylers wrote:

That does look odd. Which locale are you running this under, which
version of Perl, and which version of Sort::Naturally?

Also, when I first ran your script I initially got lots of:

   Wide character in say at ./naturally line 23.

Adding this made Perl encode the output properly and the warning go
away:

   use open ':locale';

But that you didn't need to do that makes me think there's something
different about your set-up.

Also, even your ‘no locale, naturally’ line apparently*is*  affected by
the locale! With LC_COLLATE=C, I get:

   no locale,  naturally : 1 4 10 A a And and ant Any B b C c i X x Y y Z z Ä ä 
Äm än ÄND Äs Į į


"Curiouser and curiouser," said Alice. Yes, the last one is sorted by 
Unicode character codes.


My perl is v5.30.0. My Sort::Naturally is 1.03. And my environment is:

$ env|grep LC_|sort
LC_ADDRESS=en_CA.UTF-8
LC_IDENTIFICATION=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
LC_MONETARY=en_CA.UTF-8
LC_NAME=en_CA.UTF-8
LC_NUMERIC=en_CA.UTF-8
LC_PAPER=en_CA.UTF-8
LC_TELEPHONE=en_CA.UTF-8
LC_TIME=en_CA.UTF-8



Re: Is Sort:Naturally a Little Off?

2021-09-25 Thread Smylers
Shawn H Corey writes:

> I was testing different sort routines and I think I stopped a bug in
> Sort::Naturally (see attached script). It's output is:
> 
> unsorted  : 4 A X i 1 x 10 a B ä y z į C Ä b c Į Y Z än and ÄND
> And Any ant Äm Äs
> 
> no locale,  perl  : 1 10 4 A And Any B C X Y Z a and ant b c i x y z Ä
> ÄND Äm Äs ä än Į į
> no locale,  naturally : 1 4 10 A a ä Äm And and Ä än ÄND ant Any Äs B b C c
> i Į į X x Y y Z z

That does look odd. Which locale are you running this under, which
version of Perl, and which version of Sort::Naturally?

Also, when I first ran your script I initially got lots of:

  Wide character in say at ./naturally line 23.

Adding this made Perl encode the output properly and the warning go
away:

  use open ':locale';

But that you didn't need to do that makes me think there's something
different about your set-up.

Also, even your ‘no locale, naturally’ line apparently *is* affected by
the locale! With LC_COLLATE=C, I get:

  no locale,  naturally : 1 4 10 A a And and ant Any B b C c i X x Y y Z z Ä ä 
Äm än ÄND Äs Į į

Whereas with LC_COLLATE=en_GB.utf, it's:

  no locale,  naturally : 1 4 10 a A ä Äm and And Ä än ÄND ant Any Äs b B c C i 
į Į x X y Y z Z

Note that still isn't the same as your output, because capital letters
are sorting after lower-case, rather than before in yours.

That's with Perl v5.30.0 and Sort::Naturally 1.03.

> Is the Ä out of place for the Sort::Naturally line?

It looks it to me. But there's clearly far more going on here than I
understand.

Smylers