On 8/31/06, Talin <[EMAIL PROTECTED]> wrote: > One way to handle this efficiently would be to only support the > encodings which have a constant character size: ASCII, Latin-1, UCS-2 > and UTF-32. In other words, if the content of your text is plain ASCII, > use an 8-bit-per-character string; If the content is limited to the > Unicode BMF (Basic Multilingual Plane) use UCS-2; And if you are using > Unicode supplementary characters, use UTF-32. > > (The difference between UCS-2 and UTF-16 is that UCS-2 is always 2 bytes > per character, and doesn't support the supplemental characters above > 0xffff, whereas UTF-16 characters can be either 2 or 4 bytes.)
I think we should also support UTF-16, since Java and .NET (and Win32?) appear to be using effectively; making surrogate handling an application issue doesn't seem *too* big of a burden for many apps. > By avoiding UTF-8, UTF-16 and other variable-character-length formats, > you can always insure that character index operations are done in > constant time. Index operations would simply require scaling the index > by the character size, rather than having to scan through the string and > count characters. > > The drawback of this method is that you may be forced to transform the > entire string into a wider encoding if you add a single character that > won't fit into the current encoding. A way to handle UTF-8 strings and other variable-length encodings would be to maintain a small cache of index positions with the string object. > (Another option is to simply make all strings UTF-32 -- which is not > that unreasonable, considering that text strings normally make up only a > small fraction of a program's memory footprint. I am sure that there are > applications that don't conform to this generalization, however. ) Here you are effectively voting against polymorphic strings. I believe Fredrik has good reasons to doubt this assertion. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
