Re: UTF-7 signature

2002-04-12 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- Doug Ewell wrote: Markus Scherer [EMAIL PROTECTED] wrote: In UTF-7, byte sequences overlap and many bytes in the encoding (2 out of 8 I think) contain pieces of two adjacent code units. This is more like in Huffman codes. This is one reason why I'm a

UTF-7 signature

2002-04-11 Thread Markus Scherer
On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature byte sequence of +/v8- (which was news to me). (Subject MS/Unix BOM FAQ again (small fix)) I meditated some over this - +/v8 is the encoding of U+FEFF as the first code point in a text. So far, so good

Re: UTF-7 signature

2002-04-11 Thread Markus Scherer
Shlomi Tal wrote: UTF-7, it shocked me how Greek Sokrates and S o k r a t e s (with spaces between each Greek letter in the latter) would have different encodings for the same Unicode characters. That is not unusual for stateful encodings. It's the same with BOCU-1 (not in this particular

Re: UTF-7 signature

2002-04-11 Thread Doug Ewell
Markus Scherer [EMAIL PROTECTED] wrote: On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list a UTF-7 signature byte sequence of +/v8- (which was news to me). I don't remember ever reading a recommendation, or even a suggestion, to use +/v8- as a signature for UTF-7