Re: Seeing clarification for locale names
Hi, I apologize for the late answer. Please keep me in Cc:, I am not subscribed. On Mon, Feb 15, 2021 at 05:20:30PM +0100, Florian Weimer wrote: > * Marc Haber: > > I would appreciate pointers to documentation, personal opinions, war > > stories, encoding tales, historic lectures, anything that might > > enlighten me and help me build the knowlegde and understanding about > > UNIX locales are supposed to work in Debian GNU/Linux. Thank you in > > advance! > > For the charset normalization, it's in the manual: > This code dates back to the mid-90s, I think. Took me 20+ years to finally notice. > I general, I think it is best to treat locale names as opaque strings. What is the recommended setting for the LANG and LC_ variables? de-DE.UTF-8 or the normalized version? > Parsing them to derive charsets is not going to work (e.g., no charset > can mean ISO-8859-1 or UTF-8, depending on the age of the locale). To > get the charset of the current locale, you can use “locale -k charmap”, > for example. It corresponds to the glibc charmap name (of which there > aren't too many). So the recommended way is to just set LANG to the wanted value and then look whether locale -k charmap will return the expected value? And 'charmap="ANSI_X3.4-1968"' is a telltale sign that I set LANG to a value that isnt generated on the local system? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Re: Seeing clarification for locale names
* Marc Haber: > I would appreciate pointers to documentation, personal opinions, war > stories, encoding tales, historic lectures, anything that might > enlighten me and help me build the knowlegde and understanding about > UNIX locales are supposed to work in Debian GNU/Linux. Thank you in > advance! For the charset normalization, it's in the manual: The only new thing is the @code{normalized codeset} entry. This is another goodie which is introduced to help reduce the chaos which derives from the inability of people to standardize the names of character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1}, @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized codeset} value is generated from the user-provided character set name by applying the following rules: @enumerate @item Remove all characters besides numbers and letters. @item Fold letters to lowercase. @item If the same only contains digits prepend the string @code{"iso"}. @end enumerate @noindent So all of the above names will be normalized to @code{iso88591}. This allows the program user much more freedom in choosing the locale name. This code dates back to the mid-90s, I think. I general, I think it is best to treat locale names as opaque strings. Parsing them to derive charsets is not going to work (e.g., no charset can mean ISO-8859-1 or UTF-8, depending on the age of the locale). To get the charset of the current locale, you can use “locale -k charmap”, for example. It corresponds to the glibc charmap name (of which there aren't too many). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Seeing clarification for locale names
[Please Cc: me on replies, I am not subscribed to Debian-glibc] Hi, I am a bit confused about locale names. In literature, one can see that a proper locale name is, for example, en_US.UTF-8. This is also what I write in /etc/locale.gen to have one locale "generated" on my system. locale -a, however, will print en_US.utf8. I _think_ this is the intended behavior since there is a normalizing function somewhere in the glibc sources which lowercases everything and thows out all interpunction. Otoh, there are applications that will malfuntion or print a warning if the locale isn't explicitly set to .UTF-8 (upper case, hyphen). In my shell profile scripts, I have code that will check whether the intended locale is actually present on the local system by comparing to locale -a's output (avoiding a fallback to a non-UTF-8 locale not knowing about German umlauts if one is available). Hence, my locale environment variables are all set to the respective .utf8 suffix since that's what locale -a will print. Is this a wrong approach? I would appreciate pointers to documentation, personal opinions, war stories, encoding tales, historic lectures, anything that might enlighten me and help me build the knowlegde and understanding about UNIX locales are supposed to work in Debian GNU/Linux. Thank you in advance! Greetings Ma 'Schei? Encoding!' rc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421