M.-A. Lemburg writes: > On 2009-05-03 19:39, Martin v. Löwis wrote: > >> If the error handler is supposed to be used for codecs other than utf-8, > >> perhaps it should renamed something more generic, e.g. "surrogate-escape"? > > > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > > it's an algorithm based on 16-bit or 32-bit code points.
I don't understand this phrasing. The algorithm is only applicable to ASCII-compatible octet streams. It results in code points by a simple displacement of octet -> octet + 0xDC00. It cannot be used on (say) UTF-32 to deal with embedded surrogates. Certainly, the computation requires (at least) 16 bit numbers, but the input must be restricted to a stream of 8-bit code points, while the output is 16- or 32-bit code points. > Please use a more descriptive name [than "utf-8b"] for the handler > which does not cause confusion with a existing codec. But please don't use "surrogate-escape" or (as in the current PEP) "python-escape"; it's not an escaping (quotation) mechanism. "surrogate-replace", "surrogate-substitute", or "surrogate-translate" would be better names. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com