Mark, I understand your problem with the level of mail. But, in this case, I have read the appropriate section of TUS 4.0 and quote it here to prove it, from p.59:No, surely not. If the wcslen() function is fully Unicode conformant, it
should give the same output whatever the canonically equivalent form of
its input. That more or less implies that it should normalise its input.
No, that is not a requirement of Unicode conformance.
BTW, I must confess to an inability to keep up with the level of mail on this list. There are so many things in these mails that are simply wrong, and insufficient time for knowledgeable people to correct them. I would just caution people to first consult the materials on the Unicode site (Standard, TRs, FAQs, etc.), and take much of what is on this list with a quite sizable grain of salt.
C9 A process shall not assume that the interpretations of two canonical-equivalent character
sequences are distinct.
...
â Ideally, an implementation would always interpret two canonical-equivalent character
sequences identically. ...
Perhaps my error is that I have raised (or is it lowered?) "ideally would" to "should". So let me rephrase what I said before:
If the wcslen() function is fully Unicode conformant, ideally it would give the same output whatever the canonically equivalent form of its input.
Surely that is what C9 is saying. Or is the issue about whether such a function is "a process"? I didn't say that conformance implies that a process should normalise its input (I accept that that is not true), but only that for this particular function, counting the length of a string, sensible results can be given only if the string is normalised, or at least transformed in some other way which removes distinctions between canonically equivalent forms (e.g. normalisation with some kinds of modified data).
I am tacitly assuming at this point that the function is part of a general-purpose library for use by users who are not interested in the details of character coding etc. I can see that different considerations may apply for an internal function within a Unicode processing and rendering implementation.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

