On 04/12/2003 08:39, Doug Ewell wrote:
...
(2) I am NOT interested in inventing a new normalization form, or any
variants on existing forms. Any approach that involves compatibility
equivalences, ignores the Composition Exclusions table, or creates
equivalences that do not exist in the Unicode Character Database (such
as "U+1109 + U+1109 = U+110A") is NOT of interest. That amounts to
unilaterally extending C10, which may already be too liberal to be
applied to compression.
Surely ignoring Composition Exclusions is not unilaterally extending
C10. The excluded precomposed characters are still canonically
equivalent to the decomposed (and normalised) forms. And so composing a
text with them, for compression or any other purpose, still conforms to
C10, which explicitly allows "replacement of character sequences by
their canonical-equivalent sequences" - not only when the resulting
sequence is NFC or NFD.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/