Re: Default endianness of Unicode, or not

2002-04-15 Thread Kenneth Whistler
Doug responded to Mark's clarification: > > The UTF-16M sequence <0x1234, 0x0061, 0xD800, 0xDC00> is represented > > as one of: > > <0x12 0x34 0x00 0x61 0xD8 0x00 0xDC 0x00> // BOMless > > <0xFE 0xFF 0x12 0x34 0x00 0x61 0xD8 0x00 0xDC 0x00> // BOM > > <0xFF 0xFE 0x34 0x12 0x61 0x00 0x00 0xD8 0x00

Re: Default endianness of Unicode, or not

2002-04-14 Thread Mark Davis
ECTED]> Sent: Sunday, April 14, 2002 15:28 Subject: Re: Default endianness of Unicode, or not > Mark Davis <[EMAIL PROTECTED]> wrote: > > > Part of the problem is that the term "UTF-16" means two different > > things. Let me see if I can make it clearer. >

Re: Default endianness of Unicode, or not

2002-04-14 Thread Doug Ewell
Mark Davis <[EMAIL PROTECTED]> wrote: > Part of the problem is that the term "UTF-16" means two different > things. Let me see if I can make it clearer. > > Let "UTF-16M" refer to the in-memory form, which is sequence of 16- > bit code units. The byte ordering is logically immaterial, since it >

Re: Default endianness of Unicode, or not

2002-04-13 Thread Mark Davis
icu/tr] http://www.macchiato.com - Original Message - From: "Doug Ewell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "Kenneth Whistler" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, April 13, 2002 11:42 Subject: Re: Default endiannes

Re: Default endianness of Unicode, or not

2002-04-13 Thread Doug Ewell
On Wednesday 2002-04-10, Kenneth Whistler <[EMAIL PROTECTED]> wrote: > There, feel better? Not really. I'm getting the sense on one hand that UTF-16, sans BOM, can be big-endian or little-endian depending on the platform, on the other hand that little-endian UTF-16 isn't "legal" unless it has a

RE: Default endianness of Unicode, or not

2002-04-10 Thread Yves Arrouye
> > So same semantics as before. > > Yep. The editorial committee would't be doing its job right > if it were changing the semantics of the standard. Agreed! Is there any mention that the non-BOM byte sequence is most significant byte first anywhere else? You know, for the newbies? > Joshua 1.

RE: Default endianness of Unicode, or not

2002-04-10 Thread Kenneth Whistler
Yves, > So same semantics as before. Yep. The editorial committee would't be doing its job right if it were changing the semantics of the standard. The intent here is to rewrite everything so that the semantics intended all along will finally be revealed to everyone! It really is a little like

RE: Default endianness of Unicode, or not

2002-04-10 Thread Yves Arrouye
And of course, I have been complaining about ICU's UTF-16 converter behavior, but glibc's one does the same assumption that "UTF-16" is in the local endianness: gabier% echo hello | uconv -t utf-16be | iconv -f utf-16 -t ascii iconv: illegal input sequence at position 0 gabier% So fixing one but

RE: Default endianness of Unicode, or not

2002-04-10 Thread Yves Arrouye
> "D43 UTF-16 character encoding scheme: the Unicode > CES that serializes a UTF-16 code unit sequence as a byte sequence > in either big-endian or little-endian format. > > * In UTF-16 (the CES), the UTF-16 code unit sequence > <004D 0430 4E8C D800 DF02> is serialized as > or > o

RE: Default endianness of Unicode, or not

2002-04-10 Thread Kenneth Whistler
Yves wrote, in response to Doug: > > > The last time I read the Unicode standard UTF-16 was big endian > > > unless a BOM was present, and that's what I expected from a UTF-16 > > > converter. > > > > Conformance requirement C2 (TUS 3.0, p. 37) says: > > > > "The Unicode Standard does not speci

RE: Default endianness of Unicode, or not

2002-04-10 Thread Yves Arrouye
> > The last time I read the Unicode standard UTF-16 was big endian > > unless a BOM was present, and that's what I expected from a UTF-16 > > converter. > > Conformance requirement C2 (TUS 3.0, p. 37) says: > > "The Unicode Standard does not specify any order of bytes inside a > Unicode value."

RE: Default endianness of Unicode, or not

2002-04-10 Thread Yves Arrouye
> > The last time I read the Unicode standard UTF-16 was big endian > > unless a BOM was present, and that's what I expected from a UTF-16 > > converter. > > Conformance requirement C2 (TUS 3.0, p. 37) says: > [And other many good references where TUS does *not* say that :)] OK, maybe in 2.0, o