Re: UTF-7 signature

Doug Ewell Thu, 11 Apr 2002 20:50:17 -0700

Markus Scherer <[EMAIL PROTECTED]> wrote:

> On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list
> a UTF-7 signature byte sequence of +/v8- (which was news to me).


I don't remember ever reading a recommendation, or even a suggestion, to
use +/v8- as a signature for UTF-7.  But that would be the way to encode
a standalone U+FEFF.

> This illustrates a property of UTF-7 that sets it further apart
> from most encodings than for example SCSU and BOCU-1:
> In most Character Encoding Schemes, consecutive code units/points
> are encoded in _separate_, consecutive byte sequences.
>
> In UTF-7, byte sequences overlap and many bytes in the encoding
> (2 out of 8 I think) contain pieces of two adjacent code units.
> This is more like in Huffman codes.

This is one reason why I'm a little uncomfortable with the wording in
UTR #17, which specifically mentions SCSU as a Transfer Encoding Syntax
(in contrast to a Character Encoding Scheme) but does not mention UTF-7,
which to my mind fits the definition of a TES much better.  Perhaps this
is just the conscious effort to ignore UTF-7 in the hope it will go
away; I have no problem with that.

-Doug Ewell
 Fullerton, California

Re: UTF-7 signature

Reply via email to