Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
Of course, he will not have other UTF-8-like features, such as
avoidance of ASCII values in the final trail byte, and "fast forward
parsing" by looking at the first byte.
The fast forward feature is certianly not decisive, but the random
acessibility (from any position and in any direction) is certainly
much more decisive and is a real positive factor for UTF-8, rather
than the format proposed above, which can only be read in the forward
direction, even if it can be accessed randomly to find the *next*
character. to find the *previous* one, you have to scan backward until
you eat at least one byte used to encode the character before it
(otherwise, you don't know if a 1xxxxxx byte is the first one in a
sequence, even if you can know if a byte is the last one.
Kannan is looking for a format for a protocol that he is developing.
Maybe scanning backwards through a string is not a scenario that will
ever be encountered in this protocol. It's not for us to say.
--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s