On Thu, 08 Jul 2004 09:26:49 +0200, Michael Bell wrote > It looks like my Perl is a little bit more tolerant or my > distribution tolerates more encodings. Which of the following > encodings work for you? > > iso88591 > iso-8859-1 > iso8859-1 > ISO88591 > ISO8859-1 <--- Only this one works for me > ISO-8859-1
I tested all of them and only "ISO8859-1" worked for me. > The best option - which works for me too - is the first one because > we can setup this encoding with a common regex. "locale -a" includes > it too. Attached you can find the output of "locale -a" on my machine (FreeBSD 5.2.1 / perl 5.8.2). iso88591 is not included on the list. Here is a relevant excerpt from perllocale manpage: "Sadly, even though the calling interface for setlocale() has been stan- dardized, names of locales and the directories where the configuration resides have not been. The basic form of the name is language_terri- tory.codeset, but the latter parts after language are not always present. The language and country are usually from the standards ISO 3166 and ISO 639, the two-letter abbreviations for the countries and the languages of the world, respectively. The codeset part often men- tions some ISO 8859 character set, the Latin codesets. For example, "ISO 8859-1" is the so-called "Western European codeset" that can be used to encode most Western European languages adequately. Again, there are several ways to write even the name of that one standard. Lamentably." So we confirm that the charset notation may vary from system to system.. Considering this, what's the best way to go? Kind regards, Nuno Antunes
C POSIX af_ZA.ISO8859-1 af_ZA.ISO8859-15 am_ET.UTF-8 bg_BG.CP1251 ca_ES.ISO8859-1 ca_ES.ISO8859-15 cs_CZ.ISO8859-2 da_DK.ISO8859-1 da_DK.ISO8859-15 de_AT.ISO8859-1 de_AT.ISO8859-15 de_CH.ISO8859-1 de_CH.ISO8859-15 de_DE.ISO8859-1 de_DE.ISO8859-15 el_GR.ISO8859-7 en_AU.ISO8859-1 en_AU.ISO8859-15 en_AU.US-ASCII en_CA.ISO8859-1 en_CA.ISO8859-15 en_CA.US-ASCII en_GB.ISO8859-1 en_GB.ISO8859-15 en_GB.US-ASCII en_NZ.ISO8859-1 en_NZ.ISO8859-15 en_NZ.US-ASCII en_US.ISO8859-1 en_US.ISO8859-15 en_US.US-ASCII es_ES.ISO8859-1 es_ES.ISO8859-15 et_EE.ISO8859-15 fi_FI.ISO8859-1 fi_FI.ISO8859-15 fr_BE.ISO8859-1 fr_BE.ISO8859-15 fr_CA.ISO8859-1 fr_CA.ISO8859-15 fr_CH.ISO8859-1 fr_CH.ISO8859-15 fr_FR.ISO8859-1 fr_FR.ISO8859-15 hi_IN.ISCII-DEV hr_HR.ISO8859-2 hu_HU.ISO8859-2 hy_AM.ARMSCII-8 is_IS.ISO8859-1 is_IS.ISO8859-15 it_CH.ISO8859-1 it_CH.ISO8859-15 it_IT.ISO8859-1 it_IT.ISO8859-15 ja_JP.SJIS ja_JP.eucJP ko_KR.CP949 ko_KR.eucKR la_LN.ISO8859-1 la_LN.ISO8859-15 la_LN.ISO8859-2 la_LN.ISO8859-4 la_LN.US-ASCII lt_LT.ISO8859-13 lt_LT.ISO8859-4 nl_BE.ISO8859-1 nl_BE.ISO8859-15 nl_NL.ISO8859-1 nl_NL.ISO8859-15 no_NO.ISO8859-1 no_NO.ISO8859-15 pl_PL.ISO8859-2 pt_BR.ISO8859-1 pt_PT.ISO8859-1 pt_PT.ISO8859-15 ro_RO.ISO8859-2 ru_RU.CP1251 ru_RU.CP866 ru_RU.ISO8859-5 ru_RU.KOI8-R sk_SK.ISO8859-2 sl_SI.ISO8859-2 sr_YU.ISO8859-2 sr_YU.ISO8859-5 sv_SE.ISO8859-1 sv_SE.ISO8859-15 tr_TR.ISO8859-9 uk_UA.ISO8859-5 uk_UA.KOI8-U zh_CN.GB18030 zh_CN.GB2312 zh_CN.GBK zh_CN.eucCN zh_TW.Big5