Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
UTF16_BigEndian?
ICU does not do Unicode-signature or other encoding detection
as part of a converter. When you get text from some protocol,
you need to instantiate a converter according to what you
know about the
On Thu, Apr 19, 2001 at 06:24:47PM -0700, Markus Scherer wrote:
On the other hand, if you get a file from your platform and
it is in 16-bit Unicode, then you would appreciate the
convenience of the auto-endian alias.
But nothing should be spitting out platform-endian UTF-16! In the
Yves, we are thinking about a general API for encoding detection that could initially
just check for BOM/Unicode signatures. I believe we have a feature request for this
already. Mark and I just brainstormed about what we may want an API look like.
The reason for doing what ICU is doing
Hi,
A quick question relating to the Byte Order Mark of UCS-2. If its absent is
it safe to assume any particular order (i.e. Big or Little Endian?).
I am writing a function to rearrange from Big to little endian but without a
byte order mark I'm not sure what the order is. Is there any
There is an RFC about UTF-16 that explains this:
If the text is labeled by the protocol as
charset=UTF-16 then the first two bytes are the byte order mark
charset=UTF-16BE then it is big-endian and the first two bytes are just text
charset=UTF-16LE then it is little-endian and the first two
If you don't have any clue about the byte order, but you know it is
UTF-16, then assume BE.
Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
UTF16_BigEndian? I know that was a difference between ICU and my library,
and when I asked this question a while ago I was told that despite
Date: Thu, 19 Apr 2001 12:59:43 -0700
To: Tomas McGuinness [EMAIL PROTECTED]
From: Asmus Freytag [EMAIL PROTECTED]
Subject: Re: Byte Order Marks
At 02:58 PM 4/19/01 +0200, you wrote:
If its absent is it safe to assume any particular order (i.e. Big or
Little Endian?)
The default order is Big
Yves Arrouye wrote:
If you don't have any clue about the byte order, but you know it is
UTF-16, then assume BE.
Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not
UTF16_BigEndian?
ICU does not do Unicode-signature or other encoding detection as part of a converter.
When you
On Thu, Apr 19, 2001 at 06:24:47PM -0700, Markus Scherer wrote:
On the other hand, if you get a file from your platform and it is in 16-bit Unicode,
then you would appreciate the convenience of the auto-endian alias.
But nothing should be spitting out platform-endian UTF-16! In the
case that
Hi,
When looking at a document would it be safe to assume that if you found any
of the following Byte Order Marks
* 0xFFFE (UCS-2 Little Endian)
* 0xFEFE (UCS-2 Big Endian)
* 0xEFBBBF (UTF-8)
That the document is encoded with that encoding format. That means that if I
found
In a message dated 2001-04-10 3:04:09 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
When looking at a document would it be safe to assume that if you found any
of the following Byte Order Marks
*0xFFFE (UCS-2 Little Endian)
*0xFEFE (UCS-2 Big Endian)
should be 0xFEFF
11 matches
Mail list logo