Re: Is Sort:Naturally a Little Off?
On 2021-09-25 4:29 a.m., Smylers wrote: So it appears that en_CA.utf8 and en_GB.utf8 sort ‘a’ and ‘A’ the opposite way round to each other. I wonder why. (Not relevant to any bug in Sort::Naturally, but it's now intriguing me.) I tried en_US.utf8 and en_AU.utf8. They both follow en_GB.utf8. en_CA.utf8 seems to be the odd one out. Don't know why.
Re: Is Sort:Naturally a Little Off?
Shawn H Corey writes: > And my environment is: > > $ env|grep LC_|sort > LC_ADDRESS=en_CA.UTF-8 > LC_IDENTIFICATION=en_CA.UTF-8 > LC_MEASUREMENT=en_CA.UTF-8 > LC_MONETARY=en_CA.UTF-8 > LC_NAME=en_CA.UTF-8 > LC_NUMERIC=en_CA.UTF-8 > LC_PAPER=en_CA.UTF-8 > LC_TELEPHONE=en_CA.UTF-8 > LC_TIME=en_CA.UTF-8 I think that if you don't have LC_COLLATE or LC_ALL set, then the value of LANG is used, so that could also be relevant here. I can reproduce your results when I use en_CA.utf8. So it appears that en_CA.utf8 and en_GB.utf8 sort ‘a’ and ‘A’ the opposite way round to each other. I wonder why. (Not relevant to any bug in Sort::Naturally, but it's now intriguing me.) Smylers
Re: Is Sort:Naturally a Little Off?
Setting LC_ALL=C gives the Unicode sequence for Perl's sort but Sort::Naturally still does not seem correct: LC_ALL=C ./sort-test.pl unsorted : 4 A X i 1 x 10 a B ä y z į C Ä b c Į Y Z än and ÄND And Any ant Äm Äs no locale, perl : 1 10 4 A And Any B C X Y Z a and ant b c i x y z Ä ÄND Äm Äs ä än Į į no locale, naturally : 1 4 10 Ä ä A a And and ant Any B b C c i Äm än ÄND Äs X x Y y Z z Į į use locale, perl : 1 10 4 A And Any B C X Y Z a and ant b c i x y z Ä ÄND Äm Äs ä än Į į use locale, naturally : 1 4 10 Ä ä A a And and ant Any B b C c i Äm än ÄND Äs X x Y y Z z Į į use locale, perl num : 1 4 10 A And Any B C X Y Z a and ant b c i x y z Ä ÄND Äm Äs ä än Į į
Re: Is Sort:Naturally a Little Off?
On 2021-09-25 3:01 a.m., Smylers wrote: That does look odd. Which locale are you running this under, which version of Perl, and which version of Sort::Naturally? Also, when I first ran your script I initially got lots of: Wide character in say at ./naturally line 23. Adding this made Perl encode the output properly and the warning go away: use open ':locale'; But that you didn't need to do that makes me think there's something different about your set-up. Also, even your ‘no locale, naturally’ line apparently*is* affected by the locale! With LC_COLLATE=C, I get: no locale, naturally : 1 4 10 A a And and ant Any B b C c i X x Y y Z z Ä ä Äm än ÄND Äs Į į "Curiouser and curiouser," said Alice. Yes, the last one is sorted by Unicode character codes. My perl is v5.30.0. My Sort::Naturally is 1.03. And my environment is: $ env|grep LC_|sort LC_ADDRESS=en_CA.UTF-8 LC_IDENTIFICATION=en_CA.UTF-8 LC_MEASUREMENT=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_NAME=en_CA.UTF-8 LC_NUMERIC=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_TELEPHONE=en_CA.UTF-8 LC_TIME=en_CA.UTF-8
Re: Is Sort:Naturally a Little Off?
Shawn H Corey writes: > I was testing different sort routines and I think I stopped a bug in > Sort::Naturally (see attached script). It's output is: > > unsorted : 4 A X i 1 x 10 a B ä y z į C Ä b c Į Y Z än and ÄND > And Any ant Äm Äs > > no locale, perl : 1 10 4 A And Any B C X Y Z a and ant b c i x y z Ä > ÄND Äm Äs ä än Į į > no locale, naturally : 1 4 10 A a ä Äm And and Ä än ÄND ant Any Äs B b C c > i Į į X x Y y Z z That does look odd. Which locale are you running this under, which version of Perl, and which version of Sort::Naturally? Also, when I first ran your script I initially got lots of: Wide character in say at ./naturally line 23. Adding this made Perl encode the output properly and the warning go away: use open ':locale'; But that you didn't need to do that makes me think there's something different about your set-up. Also, even your ‘no locale, naturally’ line apparently *is* affected by the locale! With LC_COLLATE=C, I get: no locale, naturally : 1 4 10 A a And and ant Any B b C c i X x Y y Z z Ä ä Äm än ÄND Äs Į į Whereas with LC_COLLATE=en_GB.utf, it's: no locale, naturally : 1 4 10 a A ä Äm and And Ä än ÄND ant Any Äs b B c C i į Į x X y Y z Z Note that still isn't the same as your output, because capital letters are sorting after lower-case, rather than before in yours. That's with Perl v5.30.0 and Sort::Naturally 1.03. > Is the Ä out of place for the Sort::Naturally line? It looks it to me. But there's clearly far more going on here than I understand. Smylers