Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

jcowan Tue, 09 Dec 2003 11:55:13 -0800

Peter Kirk scripsit:

> No, surely not. If the wcslen() function is fully Unicode conformant, it 
> should give the same output whatever the canonically equivalent form of 
> its input.


Not so.  Remember, the conformance requirement is not that a process can't
distinguish between canonically equivalent strings (otherwise a normalizer
would be impossible; it wouldn't know whether to normalize or not!) but that
a process can't assume that *other* processes will distinguish between
canonically equivalent strings.  Equally, it can't assume that the other
process will fail to distinguish them, either.

In an environment in which C wide characters are Unicode characters, then
wcslen returns the number of distinct characters in the literal string.
How many characters it contains depends on how many were placed in the
source file by the author and what, if anything, has happened to the source
file since.

-- 
As you read this, I don't want you to feel      John Cowan 
sorry for me, because, I believe everyone       [EMAIL PROTECTED]
will die someday.    -- From a Nigerian-type    http://www.reutershealth.com
                        scam spam I got         http://www.ccil.org/~cowan

Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

Reply via email to