On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:

Any non-trivial text processing is likely to be broken in presence of
surrogates.  Producing them on input is just trading known issue for
an unknown one.  Processing surrogate pairs in python code is hard.
Software that has to support non-BMP characters will most likely be
written for a wide build and contain subtle bugs when run under a
narrow build.  Note that my latest proposal does not abolish
surrogates outright.  Users who want them can still use something like
"surrogateescape"  error handler for non-BMP characters.

It seems to me that what you are asking for is an alternate, optional, utf-8-bmp codec that would raise an error, in either direction, for non-bmp chars. Then, as you suggest, if one is not prepared for surrogates, they are not allowed.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to