Re: Byte Order Marks

Markus Scherer Thu, 19 Apr 2001 12:26:46 -0700

There is an RFC about UTF-16 that explains this:

If the text is labeled by the protocol as
charset=UTF-16 then the first two bytes are the byte order mark
charset=UTF-16BE then it is big-endian and the first two bytes are just text
charset=UTF-16LE then it is little-endian and the first two bytes are just text

If you don't have any clue about the byte order, but you know it is UTF-16, then 
assume BE.

Similar for UTF-32[BE/LE].

If you don't know anything about your text, then you may start some heuristics or 
reject the text...

markus

Tomas McGuinness wrote:
> A quick question relating to the Byte Order Mark of UCS-2. If its absent is
> it safe to assume any particular order (i.e. Big or Little Endian?).

Re: Byte Order Marks

Reply via email to