On 27/10/2003 10:31, Philippe Verdy wrote:

...

The bad thing is that there's no way to say that a superfluous
CGJ character can be "safely" removed if CC(char1) <= CC(char2),
so that it will preserve the semantic of the encoded text even
though such filtered text would not be canonically equivalent.


Philippe, you have some interesting ideas here and in your previous posting.

I wonder if it would be possible to define a character with combining class zero which is automatically removed during normalisation when it is superfluous, in the sense that you define here. Of course that means a change to the normalisation algorithm, but one which does not cause backward compatibility issues.

I guess what is more likely to be acceptable, as it doesn't require but only suggests a change to the algorithm, is a character which can optionally be removed, when superfluous, as a matter of canonical or compatibility equivalence. If we call this character CCO, we can define that a sequence <c1, CCO, c2> is canonically or compatibly equivalent to <c1, c2> if cc(c1) <= cc(c2), or if either cc(c1) or cc(c2) = 0. I am deliberately now not using CGJ as this behaviour might destabilise the normalisation of current text using CGJ. But there would be no stability impact if this is a new character.

The advantage of doing this is that a text could be generated with lots of CCOs which could then be removed automatically if they are superfluous.

I am half feeling that there must be some objections to this, but it's too late at night here to put my finger on them, so I will send this out and see what response it generates.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to