I think in a sense Python *will* continue to support multiple character sets -- as byte streams. IMO that's the only reasonable approach. Unlike apparently Matz I've never heard complaints that Python 2 doesn't have enough support for character sets larger than Unicode, and that is effectively what it supports: encoded strings and Unicode string.
--Guido On 9/1/06, Paul Prescod <[EMAIL PROTECTED]> wrote: > I thought that others might find this reference interesting. It is Matz (the > inventor of Ruby) talking about why he thinks that Unicode is good for what > it does but not sufficient in general, along with some hints of what he > plans for multinationalization in Ruby. The translation is rough and is > lifted from this email: > > http://rubyforge.org/pipermail/rhg-discussion/2006-April/000136.html > > I think that the gist of it is that Unicode will be "just one character set" > supported by Ruby. This idea has been kicked around for Python before but > you quickly run into questions about how you compare character strings from > multiple character sets, to say nothing of the complexity of an character > encoding and character set agnostic regular expression engine. > > I guess Matz is the right guy to experiment with that stuff. Maybe it could > be copied in Python 4K. > What are your complaints towards Unicode? > * it's thoroughly used, isn't it. > * resentment towards Han unification? > > * inferiority complex of Japanese people? > -- > What are your complaints towards Unicode? > * no, no I do not have any complaints about Unicode > * in the domains where Unicode is adequate > -- > Then, why CSI? > > > In most applications, UCS is enough thanks to Unicode. > However, there are also applications for which this is not the case. > -- > Fields for which Unicode is not enough > Big character sets > * Konjaku-Mojikyo (Japanese encoding which includes many more than Unicode) > > * TRON code > * GB18030 > -- > Fields for which Unicode is not fitted > Legacy encodings > * conversion to UCS is useless > * big conversion tables > * round-trip problem > -- > If a language chooses the UCS system > > * you cannot write non-UCS applications > * you can't handle text that can't be expressed with Unicode > -- > If a language chooses the CSI system > * CSI is a superset of UCS > * Unicode just has to be handled in CSI > > -- > ... is what we can say but > * CSI is difficult > * can it really be implemented? > -- > That's where comes out Japan's traditional arts > > Adaptation for the Japanese language of applications > * Modification of English language applications to be able to process > Japanese > > -- > Adaptation for the Japanese language of applications > > * What engineers of long ago experienced for sure > - Emacs (NEmacs) > - Perl (JPerl) > - Bash > -- > Accumulation of know-how > > In Japan, the know-how of adaptation for the Japanese language > > (multi-byte text processing) > has been accumulated. > -- > Accumulation of know-how > > in the first place, just for local use, > text using 3 encodings circulate > (4 if including UTF-8) > -- > Based on this know-how > > * multibyte text encodings > * switching between encodings at the string level > * processing them at practical speed > is finished > -- > Available encodings > > euc_tw euc_jp iso8859_* utf-8 utf-32le > > ascii euc_kr koi8 utf-16le utf-32be > big5 gb2312 sjis utf-16be > > ...and many others > If it's a stateless encodings, in principle it can be available. > -- > It means > For applications using only one encoding, code conversion is not needed > > -- > Moreover > Applications wanting to handle multiple encodings can choose an > internal encoding (generally Unicode) that includes all others > -- > If you want to > * you can also handle multiple encodings without conversion, letting > > characters as they are > * but this is difficult so I do not recommend it > -- > However, > only the basic part is done, > it's far from being ready for practical use > * code conversion > * guessing encoding > > * etc. > -- > For the time being, today > I want to tell everyone: > * UCS is practical > * but not all-purpose > * CSI is not impossible > -- > The reason I'm saying that > They may add CSI in Perl6 as they had added > > * Methods called by "." > * Continuations > from Ruby. > Basically, they hate losing. > -- > Thank you > > > _______________________________________________ > Python-3000 mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
