On 29/10/2003 14:14, John Cowan wrote:

Peter Kirk scripsit:



Is this actually a conformance requirement? I thought I understood the following: A rendering engine which fails to render canonical equivalents identically, or fails to render certain orders sensibly, is not doing what the Unicode standard tells it that it must do. But it is not technically non-conformant because the statement that it must render canonical equivalents identically is not in a conformance clause. This implies that software producers who produce rendering engines which are deficient in this way can still claim conformance to Unicode. This is an ambiguity which, in my opinion, should be resolved in a future edition of the standard.



C9 says:


A process shall not assume that the interpretations of two canonical-equivalent
character sequences are distinct.



Yes, but this doesn't quite say that it must treat them as identical, as is clear from the following explanatory notes:

Ideally, an implementation would always interpret two canonical-equivalent character sequences identically. There are practical circumstances under which implementations may reasonably distinguish them.

Even processes that normally do not distinguish between canonical-equivalent character sequences can have reasonable exception behavior. Some examples of this behavior include graceful fallback processing by processes unable to support correct positioning of nonspacing marks;...


So a process "unable to support correct positioning of nonspacing marks" is not obliged to give the same incorrect positioning to a set of marks regardless of order. But I am not sure that this get-out clause should be applicable to a process which claims as its very essence "to support correct positioning of nonspacing marks" but actually supports only a particular arbitrary (non even canonical) order.

I would like to see this clause tightened up to say that a process which claims to interpret properly a particular sequence of marks must interpret all canonically equivalent variants of that sequence identically, with the exception of special modes to show the underlying character sequence.

Arguably conformance clause C7 in fact states this, on the basis that canonical equivalence is a part of character semantics:

C7 A process shall interpret a coded character representation according to the character
semantics established by this standard, if that process does interpret that coded character
representation.

This clause, and clause C9 with its corollary "no process can assume that another process will make a distinction between two different, but canonical-equivalent character sequences", also preclude any process from assuming that data presented to it is already normalised. It must interpret a non-normalised variant in the same way as the normalised form; and it cannot assume that the process presenting the data makes a distinction between the normalised and non-normalised form and does not reorder the data into an arbitrary canonically equivalent form. This renders superfluous any guarantees of the stability of normalisation, for processes which require normalised data must perform their own normalisation each time they read data.


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to