Re: [R] sort() depends on locale (and platform and build)
Hi, ... so something like this? [in foo.R] old.coll <- Sys.getlocale("LC_COLLATE") Sys.setlocale("LC_COLLATE", locale="C") Sys.setlocale("LC_COLLATE", locale=old.coll) Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale (and platform and build)
On 15/06/2014 17:34, Marius Hofert wrote: Hi, Thanks for you help. I use R-devel under Ubuntu 14.04, here is the output of sessionInfo(): sessionInfo() R Under development (unstable) (2014-06-02 r65832) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.2.0 tools_3.2.0 I assume ICU was not found/installed when R was installed as executing the first couple of lines of the examples section of ?icuSetCollate leads to: Warning message: In icuSetCollate(case_first = "upper") : ICU is not supported on this build [1] "aarhus" "Aarhus" "safe" "test" "Zoo" Since only the (default) locale "C" gives the order I expected, I consider changing my ~/.Rprofile. But it certainly had a reason why I changed it to "en_US.UTF-8" at some point... hope that does not break anything else. Is there any "recommendation" what to use in ~/.Rprofile (the default?)? And is the 'recommended approach' to have ICU installed and change the sorting order via icuSetCollate if necessary? Yes. (You can use the locale category LC_COLLATE or icuSetCollate, but the recommended way to do the first is via the environment variables, not in .Rprofile.) I would have not expected any influence of the locale on the sorting order, that's quite good to know. In fact, the example came up after I tried to sort students' grades in a class with several students having the same last name (which I made unique by adding the first names with a '.' separator)... quite a 'delicate' issue... Cheers, Marius -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale (and platform and build)
Hi, Thanks for you help. I use R-devel under Ubuntu 14.04, here is the output of sessionInfo(): > sessionInfo() R Under development (unstable) (2014-06-02 r65832) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.2.0 tools_3.2.0 I assume ICU was not found/installed when R was installed as executing the first couple of lines of the examples section of ?icuSetCollate leads to: Warning message: In icuSetCollate(case_first = "upper") : ICU is not supported on this build [1] "aarhus" "Aarhus" "safe" "test" "Zoo" Since only the (default) locale "C" gives the order I expected, I consider changing my ~/.Rprofile. But it certainly had a reason why I changed it to "en_US.UTF-8" at some point... hope that does not break anything else. Is there any "recommendation" what to use in ~/.Rprofile (the default?)? And is the 'recommended approach' to have ICU installed and change the sorting order via icuSetCollate if necessary? I would have not expected any influence of the locale on the sorting order, that's quite good to know. In fact, the example came up after I tried to sort students' grades in a class with several students having the same last name (which I made unique by adding the first names with a '.' separator)... quite a 'delicate' issue... Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale
On 15/06/2014 12:16, Duncan Murdoch wrote: On 15/06/2014, 1:15 AM, Marius Hofert wrote: Hi, If I use invisible(Sys.setlocale("LC_COLLATE", "C")) in ~/.Rprofile, then sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "L.Y" "Lu" whereas using invisible(Sys.setlocale("LC_COLLATE", "en_US.UTF-8")) results in sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "Lu" "L.Y" I know this issue has appeared already (https://stat.ethz.ch/pipermail/r-help//2012-February/304089.html), I just don't see a reason for the second output: either '.' comes before letters, then the result should be "L.Q" "L.Y" "Lu" or it comes afterwards, then it should be "Lu" "L.Q" "L.Y" -- the above result thus seems inconsistent to any useful notion of 'sort' (?) I don't see this either, but it appears that on your platform the "." is simply being ignored, which might be a useful kind of sorting in some contexts. ICU implements that: icuSetCollate(locale="en_US", alternate_handling="shifted") sort(c("L.Y", "Lu", "L.Q")) See ?icuSetCollate and the references there and in ?Comparison. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale
On 15/06/2014, 1:15 AM, Marius Hofert wrote: > Hi, > > If I use invisible(Sys.setlocale("LC_COLLATE", "C")) in ~/.Rprofile, then > >> sort(c("L.Y", "Lu", "L.Q")) > [1] "L.Q" "L.Y" "Lu" > > whereas using invisible(Sys.setlocale("LC_COLLATE", "en_US.UTF-8")) results in > >> sort(c("L.Y", "Lu", "L.Q")) > [1] "L.Q" "Lu" "L.Y" > > I know this issue has appeared already > (https://stat.ethz.ch/pipermail/r-help//2012-February/304089.html), I > just don't see a reason for the second output: either '.' comes before > letters, then the result should be > "L.Q" "L.Y" "Lu" or it comes afterwards, then it should be "Lu" "L.Q" > "L.Y" -- the above result thus seems inconsistent to any useful notion > of 'sort' (?) I don't see this either, but it appears that on your platform the "." is simply being ignored, which might be a useful kind of sorting in some contexts. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale (and platform and build)
On 15/06/2014 07:45, Pascal Oettli wrote: Hello, Please provide your sessionInfo(). I don't see this issue with R 3.1.0 Patched on Linux. Nor on any of my platforms. We would also need to know if ICU was found when R was installed: see ?Comparison . Regards, Pascal On Sun, Jun 15, 2014 at 2:15 PM, Marius Hofert wrote: Hi, If I use invisible(Sys.setlocale("LC_COLLATE", "C")) in ~/.Rprofile, then sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "L.Y" "Lu" whereas using invisible(Sys.setlocale("LC_COLLATE", "en_US.UTF-8")) results in sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "Lu" "L.Y" I know this issue has appeared already (https://stat.ethz.ch/pipermail/r-help//2012-February/304089.html), I just don't see a reason for the second output: either '.' comes before letters, then the result should be "L.Q" "L.Y" "Lu" or it comes afterwards, then it should be "Lu" "L.Q" "L.Y" -- the above result thus seems inconsistent to any useful notion of 'sort' (?) Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort() depends on locale
Hello, Please provide your sessionInfo(). I don't see this issue with R 3.1.0 Patched on Linux. Regards, Pascal On Sun, Jun 15, 2014 at 2:15 PM, Marius Hofert wrote: > Hi, > > If I use invisible(Sys.setlocale("LC_COLLATE", "C")) in ~/.Rprofile, then > >> sort(c("L.Y", "Lu", "L.Q")) > [1] "L.Q" "L.Y" "Lu" > > whereas using invisible(Sys.setlocale("LC_COLLATE", "en_US.UTF-8")) results in > >> sort(c("L.Y", "Lu", "L.Q")) > [1] "L.Q" "Lu" "L.Y" > > I know this issue has appeared already > (https://stat.ethz.ch/pipermail/r-help//2012-February/304089.html), I > just don't see a reason for the second output: either '.' comes before > letters, then the result should be > "L.Q" "L.Y" "Lu" or it comes afterwards, then it should be "Lu" "L.Q" > "L.Y" -- the above result thus seems inconsistent to any useful notion > of 'sort' (?) > > Cheers, > > Marius > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Pascal Oettli Project Scientist JAMSTEC Yokohama, Japan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sort() depends on locale
Hi, If I use invisible(Sys.setlocale("LC_COLLATE", "C")) in ~/.Rprofile, then > sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "L.Y" "Lu" whereas using invisible(Sys.setlocale("LC_COLLATE", "en_US.UTF-8")) results in > sort(c("L.Y", "Lu", "L.Q")) [1] "L.Q" "Lu" "L.Y" I know this issue has appeared already (https://stat.ethz.ch/pipermail/r-help//2012-February/304089.html), I just don't see a reason for the second output: either '.' comes before letters, then the result should be "L.Q" "L.Y" "Lu" or it comes afterwards, then it should be "Lu" "L.Q" "L.Y" -- the above result thus seems inconsistent to any useful notion of 'sort' (?) Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.