Jim Jewett writes: > I think "standard repertoire based on Unicode" may be confusing the issue.
By "standard repertoire" I mean that all Pythons will show the same characters the same way, while "based on Unicode" is intended to mean looking at TR#36 and TR#39 in picking the repertoires. > As I understand it, you're saying something like > > For strings, repr will delegate to display_string. Er, I'm not familiar with such a function.... What I have in mind is that for string display, repr will have a large, standard set of characters that it sends directly to output, and a set that it \u-escapes for the purpose of avoiding ambiguity. These sets are always defined the same way for any Python. For people for whom the standard display would be painful (eg, Cyrillic users and Greek users), there would be an optional post-processor (basically a codec) which would translate some \u-escapes to characters, and should also translate the conflicting characters (ie, ASCII in the case of Cyrillic and Greek) to \u-escapes. > Users can (and should) supply a display_string function > appropriate to their own system. "Can", yes, but only on a "consenting adults" basis. They should not do so in most cases. > The default display_string will display ASCII, and unicode-escape > everything else. Definitely not. The default should try to display anything that can be displayed unambiguously. If we don't do that, *nobody* will use the default except us semi-lingual Americans, and there would be no point in having a standard repertoire. For practical purposes, the only scripts I know of where there will be real problems are Cyrillic and Greek, because they share glyphs with the Latin alphabet, and by default many of their characters would be escaped. I'm sure there are other such scripts, of course, I don't mean to minimize the problem. (Some Japanese will undoubtedly complain about their full-width "ASCII", but I have no sympathy for that particular self-inflicted injury: they are already deprecated in Unicode as compatibility characters.) On the other hand, Unicode was careful to assemble a unified set of Latin characters. Although some like the Angstrom symbol do have compatibility encodings, I don't think that's a major worry. The vast majority of Asian characters (loosely defined, including not only the Han ideographs but the radicals, Korean Hangul, Japanese and Chinese syllabaries, etc) are going to be readable, too (for those with appropriate fonts). _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com