Hello R-devel, Currently, Sys.setLanguage() interprets an empty/absent environment variable LANGUAGE to mean unset="en", which disagrees with gettext(): it defaults to the LC_MESSAGES category of the current locale [1]. As a result, on systems with $LANGUAGE normally unset, Sys.setLanguage(...) returns "en" instead of the language previously in effect. I would like to suggest making the default unset = Sys.getlocale("LC_MESSAGES") instead of "en" so that Sys.setLanguage(Sys.setLanguage(anything)) would not reset language to English. Making Sys.setLanguage() accept an empty string or NA to reset or remove LANGUAGE (and allowing Sys.setLanguage() to return that value) could also be an option.
Additionally, there is a number of problems with the way Sys.setLanguage() handles R having started up in the C locale, some of them easier to solve than others. gettext() disables translation lookup only when the LC_MESSAGES locale category is "C" or "POSIX", so the current test for identical("C", Sys.getlocale()) will miss the situations when not all locale categories are set to "C". I think the correct test should be Sys.getlocale("LC_MESSAGES") %in% c("C", "POSIX", "C.UTF-8", "C.utf8"). (On my GNU/Linux system, setting a "POSIX" locale returns it as "C", but I don't think that's guaranteed to happen everywhere.) So what should Sys.setLanguage(lang, force=TRUE) do when the current LC_MESSAGES locale category disables translation? "en_US.UTF-8" is not guaranteed to be present on a given system. POSIX documents 'locale -a' to list available locales [2], so R could attempt something like: # any locales except C.*/POSIX which disable translation? system("locale -a", intern = TRUE) |> setdiff(c("C", "C.UTF-8", "C.utf8", "POSIX")) -> candidates locale <- if (any(mask <- startsWith(candidates, lang))) { candidates[mask][[1]] } else if (length(candidates)) { candidates[[1]] } else { "en_US.UTF-8" # maybe it's available despite 'locale -a' failing? } lcSet <- Sys.setlocale("LC_MESSAGES", locale) Unfortunately, that's not all: translations are also affected by the LC_CTYPE category of the current locale, and gettext() will try to convert the translations into that locale's encoding before returning them. What about LC_CTYPE being "C"? Sometimes gettext() is able to transliterate: $ LC_CTYPE=C LANGUAGE=ru R -q -s -e 'foo' Oshibka: ob``ekt 'foo' ne najden Vy`polnenie ostanovleno And sometimes it's not: $ LC_CTYPE=C LANGUAGE=zh_CN R -q -s -e 'foo' ??: ?????'foo' ???? # <-- these are \x3F question marks, not replacement characters There doesn't seem to be a portable way to determine a locale with an encoding that would be appropriate in the current session. For example, on my system, only 4 locales out of 11 listed by 'locale -a' use UTF-8 as their encoding (and sometimes UTF-8 is the wrong choice when I'm using 'luit' with a non-UTF-8 environment). R could try to force the same locale for LC_CTYPE as it sets LC_MESSAGES, or force a UTF-8 locale if it finds one, or leave LC_CTYPE as it is. All of these options have their downsides. How helpful is Sys.setLanguage(force = TRUE) in practice? -- Best regards, Ivan [1] The environment variables used for gettext() are listed at the following resources: https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02 The exact lookup procedure is also documented here: https://pubs.opengroup.org/onlinepubs/9799919799/functions/dngettext.html In short, if the LC_MESSAGES category of the current locale is "C" or "POSIX", gettext() does not translate. (GNU gettext additionally disables translation for "C.UTF-8".) Otherwise it consults the LANGUAGE environment variable. If that variable is absent or empty, it uses the LC_MESSAGES category of the current locale. When a program calls setlocale(category, ""), $LANG provides the default value for all categories, which is overridden by the $LC_* variables for individual categories, which are all overridden by $LC_ALL. [2] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/locale.html ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel