Hi, Markus Hardly misleading! You can, of course, view UTF-16 data in memory as an array of 16-bit code units. But you can also view it as an array of bytes. This might not be a good idea, but it is necessary occasionally.
When a UTF-16 string is treated as an array of bytes, it's supremely important to know the byte order. The OP asked about byte order, and seemed to me to be referring to data in memory. Hence my answer. Cheers - rick -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Markus Scherer Sent: August 12, 2004 9:19 To: unicode Subject: Re: Wide Characters in Windows and UTF16 Rick Cameron wrote: > Microsoft Windows uses little-endian byte order on all platforms. > Thus, on Windows UTF-16 code units are stored in little-endian byte order in memory. > > I believe that some linux systems are big-endian and some > little-endian. I think linux follows the standard byte order of the > CPU. Presumably UTF-16 would be big-endian or little-endian accordingly. This is somewhat misleading. For internal processing, where we are talking about the UTF-16 encoding form (quite different from the external encoding _scheme_ of the same name), we don't have strings of bytes but strings of 16-bit units (WCHAR in Windows). Program code operating on such strings could not care less what endianness the CPU uses. Endianness is only an issue when the text gets byte-serialized, as is done for the external encoding schemes (and usually by a conversion service). markus

