Peter Kirk <peterkirk at qaya dot org> wrote: >> The "wcslen" has nothing whatsoever to do with the Unicode standard, >> but it has all to do with the *C* standard. And, according to the C >> standard, "wcslen" must simply count the number "wchar_t" array >> elements from the location pointed to by its argument up to the first >> "wchar_t" element whose value is L'\0'. Full stop. > > OK, as a C function handling wchar_t arrays it is not expected to > conform to Unicode. But if it is presented as a function available to > users for handling Unicode text, for determining how many characters > (as defined by Unicode) are in a string, it should conform to Unicode, > including C9.
wcslen() is very definitely presented as a function for counting _code_units_. You can't even rely on it to count Unicode characters accurately, if a wchar_t is 16 bits long, because supplementary characters will require 2 code points (high + low surrogate). Programmers rely on primitive functions like wcslen() to do what they do very rapidly, and not to change their meaning in new versions of the language standard. It would be very handy to have a suite of C functions that normalize their input string to any of NFK*[CD], or to compare strings or measure their length taking normalization into account, but those would have to be all-new functions. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

