Dear R devel,

What is the correct way to write package tests that could possibly fail due to locale collation behavior? Is it safe/proper for me to call Sys.setlocale("LC_COLLATE", "en_US.UTF-8") in each test file? Or should I explicitly force collation to C before writing tests? Or do I need to always call sort() on my comparison objects to ensure they are sorted in the same locale-specific way?

I'd had a strange situation where a package test I'm writing fails R CMD check, but runs fine in the R terminal. I eventually got to the point where I can see that in R CMD check, the vector I'm comparing to evaluate the test result did not seem to be sorted as requested. Further digging revealed that the locale's LC_COLLATE value is set to 'C' in R CMD check while it is "en_US.UTF-8" in my R terminal.

Now that I know what to look for in the documentation, I realize that this is a feature. p.36 of "Writing R Extensions" states:

"All these tests are run with collation set to the C
locale, and for the examples and tests with environment variable
LANGUAGE=en: this is to minimize differences between platforms. "

It appears that this impacts the sort order of capital letters

> Sys.setlocale("LC_COLLATE", "C")
[1] "C"
> sort(c("a",'A','b','c'))
[1] "A" "a" "b" "c"
> Sys.setlocale("LC_COLLATE", "en_US.UTF-8")
[1] "en_US.UTF-8"
> sort(c("a",'A','b','c'))
[1] "a" "A" "b" "c"

best,
 -skye

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to