> At 17:02 -0800 2004-03-30, Mike Ayers wrote: > > It does not seem reasonable to > >me that *any* standard behavior could be expected of PUA code > >points, from operating systems or applications,
and Michael Everson responded: > Which I assume means: "it's wrong for Unicode to make ANY property > pronouncements for ANY PUA characters, since that defines them, and > removes the P from the Use." The problem is that real software has function (or method) invocations in it like: character.getProperty() And if a user, via whatever indirect stack of software may be involved, manages to accomplish: character.setValue(0xE000) then, an invocation to character.getProperty() has to do something more reasonable than result in an access violation and freeze the computer. Or do you really think that PUA purists would prefer that kind of behavior in their software? Hmmm? The bidirectional algorithm depends on a partition property. Every code point that participates in the algorithm has to have *some* value of that partition for the algorithm to be well-defined for all encoded characters -- and that includes PUA characters, which are encoded characters. The UTC could have chosen bc=ON or bc=BN or bc=R or something completely stupid like bc=PDF instead of bc=L as the default property for PUA characters, but "None of the Above" was not an option. Based on their implementation experience with use of PUA characters, bc=L made the most sense and was the choice made by the UTC for the default. Consider another example. The normalization algorithm has to work for *all* Unicode code points, assigned or not, because it guarantees stability into the future when characters are encoded at code points which were previously unencoded. It also, then, obviously has to work for PUA characters, as well. That implies that two additional properties *MUST* have some default values set for PUA characters. One of those is decomposition, which is defaulted to the null string (no decomposition) for all PUA characters. The other is canonical combining class, which is defaulted to ccc=0 for all PUA characters. Doing anything else would have just been stupid. But again, "None of the Above" was not an option. --Ken

