> Thanks for the clarification. We are again talking at different levels. > I am still looking from the point of view of an application programmer > interested in a string as an abstract entity (an object or an abstract > data type) with a meaning or interpretation, but with no interest in the > exact encoding. You are looking at this at a lower level, either of a > systems programmer or of an application programmer who is forced to get > into this lower level stuff because of inadequate system support at the > more abstract level.
Please stop this thread Peter, Kenneth has been clear enough when pointing that the "in context" meaning of the problematic sentence you quoted from the standard was in fact clear enough to explain what is meant by "interpretation". For me this relates to the interpretation of default grapheme clusters, which is where canonical equivalence applies. If you go to the abstract character level, there's no such "equivalence" rule as normalization operates on default grapheme clusters, but not on the lower level (abstract characters or code points, and not even at the even lower level of code units in a encoding form, or stream bytes in a encoding scheme or transport encoding syntaxes). So if an application offers an interface that claims to operate on grapheme clusters, the conformance rule for canonical equivalence applies, and distinct but canonically equivalent encoding forms of any string must be treated the same. If you look at XML for example, there's no support for grapheme clusters as XML operates at the abstract character level (or code points), meaning that treating the same way all canonical equivalent strings is not required in a conforming XML processor. But for a text renderer, or for a UCA collation algorithm, supporting the high-level grapheme clusters is required, and this is where canonically equivalences are the most meaningful and in fact required for Unicode conformance. This may also be required for security-related texts (such as domain names in IDNA), where distinct but canonically equivalent strings must be given exactly the same meaning and resolve identically with the same "interpretation", as these items are intended to be exposed to users that will need to reproduce them the way they usually read or type them. The meaning of "interpretation" is then dependant of the application using Unicode texts. But it is directly related to the level at which the application operates on its claimed public interface: grapheme clusters, abstract characters/code points, code units, stream bytes. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

