Yves Arrouye <[EMAIL PROTECTED]> wrote: > The last time I read the Unicode standard UTF-16 was big endian > unless a BOM was present, and that's what I expected from a UTF-16 > converter.
Conformance requirement C2 (TUS 3.0, p. 37) says: "The Unicode Standard does not specify any order of bytes inside a Unicode value." In Section 2.7, the passage on page 28 titled "Byte Order Mark (BOM)" says: "... Ideally, all implementations of the Unicode Standard would follow only one set of byte order rules, but this scheme would force one class of processors to swap the byte order on reading and writing plain text files, even when the file never leaves the system on which it was created." Section 13.6, "Specials: U+FEFF, U+FFF0-U+FFFF," again acknowledges the potential ambiguity of byte order without indicating a preference: "... Some machine architectures use the so-called big-endian byte order, while others use the little-endian byte order. When Unicode text is serialized into bytes, the bytes can go in either order, depending on the architecture." And Unicode Standard Annex #19, "UTF-32," Section 2, distinguishes between UTF-32BE, UTF-32LE, and UTF-32, specifically stating that the latter may be serialized "in either big-endian or little-endian format." Presumably UTF-16 would be consistent with this. I do remember reading once, somewhere, that big-endian was a preferred default in the absence of *any* other information (including platform of origin). But I can't find anything in the Unicode Standard to back this up, so I'll assume for now that both byte orientations are considered equally legitimate. -Doug Ewell Fullerton, California "Little-endian" user

