> > One of the advantages of using Python 2 for parsing is that it can work > > with a complete 32-bit Unicode charset encoding (UTF-8), rather than > > just a locale-specific subset, and includes support for transforming > > many (most) subsets into UTF-8. > > My understanding is that you need the catalogs and NLS support built > into Python to take advantage of that, and that means ensuring that the > package maintainer (or if you do source builds on your own) did not use the > --disable-nls switch when compiling. Many do (and there's good reason to).
David, I've looked through the Python 2.0 and 2.1 sources for this switch, and can't find it. It's not mentioned in the README or any of the docs, and isn't in the configure.in. Looking at the build sources, the unicode object isn't conditionalized in any way, so it would be hard to build Python without it. There is an issue about which codecs (transformers between encodings) are installed. By default only the codecs for the following encodings are installed (cp* are various Windows code pages): ascii.py cp037.py cp1006.py cp1026.py cp1250.py cp1251.py cp1252.py cp1253.py cp1254.py cp1255.py cp1256.py cp1257.py cp1258.py cp424.py cp437.py cp500.py cp737.py cp775.py cp850.py cp852.py cp855.py cp856.py cp857.py cp860.py cp861.py cp862.py cp863.py cp864.py cp865.py cp866.py cp869.py cp874.py cp875.py iso8859_1.py iso8859_10.py iso8859_13.py iso8859_14.py iso8859_15.py iso8859_2.py iso8859_3.py iso8859_4.py iso8859_5.py iso8859_6.py iso8859_7.py iso8859_8.py iso8859_9.py koi8_r.py latin_1.py mac_cyrillic.py mac_greek.py mac_iceland.py mac_latin2.py mac_roman.py mac_turkish.py mbcs.py utf_16.py utf_16_be.py utf_16_le.py utf_8.py The codecs for CJK are, I think, still under development, and in any case are distributed separately from http://sourceforge.net/projects/python-codecs. Bill
