On Mon, Dec 8, 2008 at 1:45 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > M.-A. Lemburg <mal <at> egenix.com> writes: >> >> Such application specific error handlers could then also apply >> whatever fancy round-trip safe encoding of non-decodable bytes >> to Unicode escapes, private code points, etc. as seen fit by the >> application. > > I'd argue that such fancy round-trip safe error handler should be provided by > Python. It's not reasonable to expect application coders to come up with their > own codec variation based on subtle details of the unicode spec.
Except they're clearly NOT part of the unicode spec. Moreover, whatever tricks you use vary depending on if your garbage input is from UTF-8, UTF-16, or UTF-32 (or any other arbitrary encoding, like CP-1252 or Shift-JIS.) At this point someone suggests we have a type that can store an arbitrary mix of unicode and bytes, so the undecodable portions stay in their original form. :P -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com