Hello, Am 2010-06-03 07:07, schrieb Kannan Goundan:
This is currently what I do (I was referring to this as the "compact UTF-8-like encoding"). The one difference is that I put all the marker bits in the first byte (instead of in the high bit of every byte): 0xxxxxxx 10xxxxxx xyyyyyyy 110xxxxx xxyyyyyy yzzzzzzz
The problem with this encoding is that the trailing bytes are not clearly marked: they may start with any of '0', '10', or '110'; only '111' would mark a byte unambiguously as a trailing one. In contrast, in UTF-8 every single byte carries a marker that unambiguously marks it as either a single ASCII byte, a starting, or a continuation byte; hence you have not to go back to the beginning of the whole data stream to recognize, and decode, a group of bytes. Best wishes, Otto Stolz

