On Wed, 20 Feb 2013 12:49:39 -0800 [email protected] wrote: > They should be supported by APIs, components, and > applications that handle (i.e., either process or pass through) all > Unicode strings, such as a text editor or string class. Where an > application does make internal use of a noncharacter, it should take > some measures to sanitize input text from unknown sources.
Does this mean that a general purpose application written in C that uses Microsoft's 16-bit wchar_t to handle little-endian UTF-16 input using the fgetwc() function should be regarded as broken? The problem is that a return value of 0xFFFF means not non-character U+FFFF, but end of file! U+FFFE at the start of a UTF-16 file should also cause some headaches! Doesn't Microsoft Windows still interpret this as a byte-order mark without asking whether there may be a byte-order mark? Richard.

