Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

Tim Greenwood Thu, 11 Dec 2003 09:54:42 -0800

In my interpretation of the C standard (which I am reading from 
http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a 
valid wchar_t encoding if your execution character set contains 
characters outside the C0 controls and Basic Latin range, and UTF-16 is 
not a valid wchar_t encoding if your execution character set has 
characters outside the BMP. In other words whatever you consider to be a 
character (which may be a combining character) must be encoded in one 
wchar_t code unit.


The relevant passage is

11 A wide character constant has type wchar_t, an integer type defined 
in the <stddef.h> header. The value of a wide character constant 
containing a single multibyte character that maps to a member of the 
extended execution character set is the wide character (code) 
corresponding to that multibyte character, as defined by the mbtowc 
function, with an implementation-defined current locale. The value of a 
wide character constant containing more than one multibyte character, or 
containing a multibyte character or escape sequence not represented in 
the extended execution character set, is implementation-defined.

Tim

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

Reply via email to