On 09/12/2003 10:16, [EMAIL PROTECTED] wrote:
Peter Kirk scripsit:
No, surely not. If the wcslen() function is fully Unicode conformant, it
should give the same output whatever the canonically equivalent form of
its input.
Not so. Remember, the conformance requirement is not that a process can't
distinguish between canonically equivalent strings ...
Remembered. This is not a conformance requirement, just an "ideally".
See C9 and the posting I just made.
... (otherwise a normalizer
would be impossible; it wouldn't know whether to normalize or not!) ...
Not so. Normalisation is idempotent i.e. the result of normalising an
already normalised string (with the same normalisation form) is
identical to that of not normalising it. So the normaliser doesn't need
to know in advance if the string is normalised. Now it may be more
efficient to test for normalisation first; but the conformance clause
says nothing to stop you making implementation shortcuts.
... but that
a process can't assume that *other* processes will distinguish between
canonically equivalent strings. Equally, it can't assume that the other
process will fail to distinguish them, either.
In an environment in which C wide characters are Unicode characters, then
wcslen returns the number of distinct characters in the literal string.
How many characters it contains depends on how many were placed in the
source file by the author and what, if anything, has happened to the source
file since.
This implies that wcslen is not doing what C9 says that it "ideally...
would always" do. But see the caveats in my other posting.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/