On 4/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 4/18/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > On 4/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > > > On 4/17/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > > The locale module doesn't deal with Unicode, only with 8-bit characters > > > > (not > > > > multi-byte characters). You'll lose this anyway. Certainly > > > > string.letters is not going to provide this functionality. > > > > But for languages in Latin1, 8-bit characters are sufficient -- > > > anything with more than 8 bits is by definition not a (local) letter. > > > Latin-1 is just another encoding (and not a very useful one given that > > it can't encode all of Unicode). I don't want to define a feature that > > only works for Latin-1. > > Today, string.letters works most easily with ASCII supersets, and is > effectively limited to 8-bit encodings. Once everything is unicode, I > don't think that 8-bit restriction should apply any more.
But we already went over this. There are over 40K letters in Unicode. It simply makes no sense to have a string.letters approaching that size. > > > I won't swear that localizations currently replace string.letters with > > > the appropriately ordered (slight) superset, but it is a valid use > > > case, and string* (or text*) is clearly the right place. > > > The right solution for locale-dependent collation for sure isn't > > having a string containing all the letters in the right order. There > > are plenty of languages where that approach doesn't even work. > > Theoretically, English is one of those non-working languages. (Names > in bibliographic entries are supposed to be alphabetized according to > language of origin.) > > In practice, ordered-list-of-chars works well enough, often enough. > It often works better than sorting by code point, which is the only > obvious alternative. > > Unless I missed it (and I may have), unicode itself sort of ducks the > question about how to sort strings. Python really needs to provide > *an* answer, but I'm not sure it is possible to provide the (single) > correct answer. The Unicode standard certainly has a solution, but it is complicated and I don't believe it is currently implemented in core Python. > string.letters is one workaround, and I don't think we should remove > it until a better solution (or workaround) is available. I disagree. The correct solution is to implement the Unicode support for locale-specific sorting. Remember that the locale module supports only a single, global locale at a time. This renders it totally useless in many apps requiring locale support (such as web servers). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
