John Cowan wrote:
> 
> Now suppose we have a character sequence beginning with U+FEFF U+0020.
> This would be encoded as follows:
> 
> US-ASCII: (not possible)
> UTF-16:   0xFE 0xFF 0xFE 0xFF 0x00 0x20 ...
> UTF-16:   0xFF 0xFE 0xFF 0xFE 0x20 0x00 ...
> UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
> UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
> UTF-8N:   0xEF 0xBB 0xBF 0x20 ...
> UTF-8B:   0xEF 0xBB 0xBF 0xEF 0xBB 0xBF 0x20 ...

There is something I should have missed.

It was my understanding that U+FEFF when received as first character should
be seen as BOM and not as a character, and handled accordingly.

So I expected:
  US-ASCII: 0x20
  UTF-16:   0xFE 0xFF 0x00 0x20 ...
  UTF-16:   0xFF 0xFE 0x20 0x00 ...
  UTF-16BE: 0xFE 0xFF 0x00 0x20 ...
  UTF-16LE: 0xFF 0xFE 0x20 0x00 ...
  UTF-8N:   0xEF 0xBB 0xBF 0x20 ...
  UTF-8B:   0xEF 0xBB 0xBF 0x20 ...


Antoine

Reply via email to