On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" <mar...@v.loewis.de> wrote: > Despite there being also an error handler called "surrogates".
Not that I have to be, but I'm not sold on the previous UTF-8 codec behavior becoming an error handler of the name "surrogates" for two reasons (I do respect the obvious PBP argument for the implementation, and have no better name - "lenient"?). First, unless there's a way to stack error handlers, there's no way to access the old behavior combined with the "replace" handler. Second, errors="surrogates" reads like surrogates should be an error, not an additionally allowed pattern. Neither of these are deal breakers or hard to learn, but they are non-obvious. I think the utf8b behavior makes a lot more sense with the name "surrogates", through the mnemonic that errors become surrogates. The stacking argument also applies to the new utf8b behavior on encode (only, as it handles all errors on decode). This may be a YAGNI, but for a non-UTF-8 encode, it may be useful to allow "xmlcharrefreplace" handling for unavailable non-surrogate-escaped characters. But without stacking that's unmaintainable, as we clearly don't want ${codec}b for all current codecs. I'd be perfectly happy with utf8b or UTF-8b, as either a codec or an error handler (do we want both? YAGNI?). So what if it smells a little inaccurate as a handler when used with codecs other than UTF-8, no big deal. I could also see something like errors="roundtrip" which explains the intention of the handler rather than the algorithm, but is awkward on encode when it encounters unavailable Unicode characters. -- Michael Urman _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com