Philippe Verdy wrote:

The problem with this solution is that stability is not guaranteed across
backward versions of Unicode: if a tool A implements the new version of
combining classes and normalizes its input, it will keep the relative
ordering of characters. If its output is injected into a tool B that still
uses
the legacy classes, the tool B may either reject the input (not normalized)
or force the normalization. Then is the text comes back to tool A, it will
see a modified text.

Wouldn't it be possible to, if this is of any importance in a specific situation, specify a Unicode version, and not utilise additional normalisation data that is only specified in later versions than the specified version? For example,

x = normalise("some text", 4.0);

normalises the text according to the rules specified in Unicode 4.0, or, if the software has not yet been updated with this information, according to the rules in an earlier version of Unicode, while

x = normalise("some text");

would normalise the text according to the most recent version of Unicode for which the "normalise" program has any data.

Stefan




Reply via email to