Neil Hodgson wrote: > I'd like to more tightly define Unicode strings for Python 3000. > Currently, Unicode strings may be implemented with either 2 byte > (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to > contain any Unicode character and should be indexable yielding > characters rather than half characters. Therefore Python strings > should appear to be UTF-32. There could still be multiple > implementations (using UTF-16 or UTF-8) to preserve space but all > implementations should appear to be the same apart from speed and > memory use.
That's very tricky. If you have multiple implementations, you make usage at the C API difficult. If you make it either UTF-8 or UTF-32, you make PythonWin difficult. If you make it UTF-16, you make indexing difficult. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com