On 29/04/2004 16:56, Kenneth Whistler wrote:

Peter Kirk wrote, in response to Ernest Cline:



...  It simply is impossible
to simulate non-zero canonical combining class characters in Unicode
with anything other than a character with the appropriate canonical
combining class. ...



True. But fortunately Unicode don't really need to worry about normalisation of PUA data, as this is surely out of its scope.


Not quite. PUA code points are subject to the Unicode normalization
algorithm, as well as any other. Their behavior in NFC or NFD,
for example, is rigidly defined, if trivial: a PUA code point
normalizes to itself.



Indeed. Perhaps I should have referred to any transformations of PUA data for normalisation. Unicode rightly does not transform it.


I was actually thinking more of logical normalisation, i.e. that it is not up to Unicode to decide whether <ELMTREE SYMBOL, COMBINING CHIPMUNK, COMBINING SQUIRREL> is semantically equivalent to <ELMTREE SYMBOL, COMBINING SQUIRREL, COMBINING CHIPMUNK> or, if they are, to provide a mechanism whereby one of these is normalised to the other. If in fact they are equivalent (e.g. the squirrel is on the ground, but the chipmunk is in the tree), then it is up to the PUA user to ensure that the data is ordered consistently or to provide private non-standard ordering mechanisms. Do you agree? If this is true, then there is no point in allocating the combining PUA characters to any class other than zero.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to