Re: Corrigendum #9 clarifies noncharacter usage in Unicode

Richard Wordingham Thu, 21 Feb 2013 11:15:19 -0800

On Wed, 20 Feb 2013 12:49:39 -0800
[email protected] wrote:

> They should be supported by APIs, components, and
> applications that handle (i.e., either process or pass through) all
> Unicode strings, such as a text editor or string class. Where an
> application does make internal use of a noncharacter, it should take
> some measures to sanitize input text from unknown sources.


Does this mean that a general purpose application written in C that uses
Microsoft's 16-bit wchar_t to handle little-endian UTF-16 input using
the fgetwc() function should be regarded as broken?  The problem is
that a return value of 0xFFFF means not non-character U+FFFF, but end
of file! 

U+FFFE at the start of a UTF-16 file should also cause some headaches!
Doesn't Microsoft Windows still interpret this as a byte-order mark
without asking whether there may be a byte-order mark?

Richard.

Re: Corrigendum #9 clarifies noncharacter usage in Unicode

Reply via email to