* Marc Haber: > I would appreciate pointers to documentation, personal opinions, war > stories, encoding tales, historic lectures, anything that might > enlighten me and help me build the knowlegde and understanding about > UNIX locales are supposed to work in Debian GNU/Linux. Thank you in > advance!
For the charset normalization, it's in the manual: The only new thing is the @code{normalized codeset} entry. This is another goodie which is introduced to help reduce the chaos which derives from the inability of people to standardize the names of character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1}, @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized codeset} value is generated from the user-provided character set name by applying the following rules: @enumerate @item Remove all characters besides numbers and letters. @item Fold letters to lowercase. @item If the same only contains digits prepend the string @code{"iso"}. @end enumerate @noindent So all of the above names will be normalized to @code{iso88591}. This allows the program user much more freedom in choosing the locale name. This code dates back to the mid-90s, I think. I general, I think it is best to treat locale names as opaque strings. Parsing them to derive charsets is not going to work (e.g., no charset can mean ISO-8859-1 or UTF-8, depending on the age of the locale). To get the charset of the current locale, you can use “locale -k charmap”, for example. It corresponds to the glibc charmap name (of which there aren't too many). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill