Markus Scherer <[EMAIL PROTECTED]> wrote: > On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list > a UTF-7 signature byte sequence of +/v8- (which was news to me).
I don't remember ever reading a recommendation, or even a suggestion, to use +/v8- as a signature for UTF-7. But that would be the way to encode a standalone U+FEFF. > This illustrates a property of UTF-7 that sets it further apart > from most encodings than for example SCSU and BOCU-1: > In most Character Encoding Schemes, consecutive code units/points > are encoded in _separate_, consecutive byte sequences. > > In UTF-7, byte sequences overlap and many bytes in the encoding > (2 out of 8 I think) contain pieces of two adjacent code units. > This is more like in Huffman codes. This is one reason why I'm a little uncomfortable with the wording in UTR #17, which specifically mentions SCSU as a Transfer Encoding Syntax (in contrast to a Character Encoding Scheme) but does not mention UTF-7, which to my mind fits the definition of a TES much better. Perhaps this is just the conscious effort to ignore UTF-7 in the hope it will go away; I have no problem with that. -Doug Ewell Fullerton, California

