> > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > > > it's an algorithm based on 16-bit or 32-bit code points. > > I don't understand this phrasing. The algorithm is only applicable to > ASCII-compatible octet streams. It results in code points by a simple > displacement of octet -> octet + 0xDC00. It cannot be used on (say) > UTF-32 to deal with embedded surrogates. > > Certainly, the computation requires (at least) 16 bit numbers, but the > input must be restricted to a stream of 8-bit code points, while the > output is 16- or 32-bit code points.
Right - the algorithm maps between bytes and 16/32-bit code units. It works, in particular, for UTF-8, and was originally proposed to apply to UTF-8 - but it can work in any other place that converts bytes to 16/32-bit code units as well. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com