RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

Philippe Verdy Thu, 11 Dec 2003 07:35:59 -0800

> Thanks for the clarification. We are again talking at different levels. 
> I am still looking from the point of view of an application programmer 
> interested in a string as an abstract entity (an object or an abstract 
> data type) with a meaning or interpretation, but with no interest in the 
> exact encoding. You are looking at this at a lower level, either of a 
> systems programmer or of an application programmer who is forced to get 
> into this lower level stuff because of inadequate system support at the 
> more abstract level.


Please stop this thread Peter, Kenneth has been clear enough when pointing
that the "in context" meaning of the problematic sentence you quoted from
the standard was in fact clear enough to explain what is meant by
"interpretation".

For me this relates to the interpretation of default grapheme clusters,
which is where canonical equivalence applies.

If you go to the abstract character level, there's no such "equivalence"
rule as normalization operates on default grapheme clusters, but not on the
lower level (abstract characters or code points, and not even at the even
lower level of code units in a encoding form, or stream bytes in a encoding
scheme or transport encoding syntaxes).

So if an application offers an interface that claims to operate on grapheme
clusters, the conformance rule for canonical equivalence applies, and
distinct but canonically equivalent encoding forms of any string must be
treated the same.

If you look at XML for example, there's no support for grapheme clusters as
XML operates at the abstract character level (or code points), meaning that
treating the same way all canonical equivalent strings is not required in a
conforming XML processor.

But for a text renderer, or for a UCA collation algorithm, supporting the
high-level grapheme clusters is required, and this is where canonically
equivalences are the most meaningful and in fact required for Unicode
conformance.

This may also be required for security-related texts (such as domain names
in IDNA), where distinct but canonically equivalent strings must be given
exactly the same meaning and resolve identically with the same
"interpretation", as these items are intended to be exposed to users that
will need to reproduce them the way they usually read or type them.

The meaning of "interpretation" is then dependant of the application using
Unicode texts. But it is directly related to the level at which the
application operates on its claimed public interface: grapheme clusters,
abstract characters/code points, code units, stream bytes.


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

Reply via email to