Michael D'Errico <mike dash list at pobox dot com> wrote:

If you want a really fast alternate encoding, you could encode all of Unicode in at most 3 bytes. Use the high bit as a "continuation" bit and the lower 7 bits as the data.

ASCII gets passed through unchanged.

This is essentially what I was going to suggest to Kannan, since avoidance of ASCII bytes, nulls, etc. is not relevant to his use case. The conversion is lightning-fast; it can be optimized to be even faster than UTF-8.

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­


Reply via email to