There are two types of conforming processes : - those that produce a rendering will just have to give a result that may be slightly different but still obeying to the character identities ; quality of rendering is not an conformance issue if we still read the result as being an epsilon with tonos, even if the exact placement of the tonos is modified a bit (or even if the tonos partly collides visually with the epsilon, when it should not and does not when rendering the canonically equivalent precomposed character).
- those that produce textual or numeric data from a source text, should return the same result (or a canonically equivalent result, if this result is textual). If the process says that its result will be normalized (NFC, or NFD, or NFKC or NFKD) then the textual result should be binary identical (same number of code points, same sequence code code point values) and conforming to the standard normalization. If the result is just using a "fast" algorithm, they may be binary different but they should still be canonically equivalent. But there's an admitted exception : sorting with UCA may change the relative order between the source strings, simply because sort stability is not always wanted (it has a cost), and binary sorting the results using the code point values as an additional collation level is not always wanted, and normalization remains optional in UCA. The result is not strictly canonically equivalent, because items in the sorted list may be in different order, but still the comparable items should be canonically equivalent. But more importantly, the main different will occur when you use regular expressions to match partial clusters (therere are known difficulties, for example if you search for a combining accent, about which part of the source content to return in the match, or if a letter precombined with that accent should match or not). 2013/8/6 Jukka K. Korpela <[email protected]> > 2013-08-05 23:46, Richard Wordingham wrote: > > The requirement is that conformant processes not think they are doing >> the right thing by treating canonically equivalent strings >> differently. If there is latitude in a process, e.g. rendering, I >> can't find a requirement to treat canonically equivalent strings >> identically. Can you? >> > > The first sentence is somewhat difficult to understand. I suppose the key > is the word "the" vs. "a" in "the right thing". > > As far as I can see, the standard allows canonically equivalent strings to > be handled differently, but it says that software should not expect other > software to do so. > > In particular, in rendering, a program might display U+03B5 GREEK SMALL > LETTER EPSILON U+0384 GREEK TONOS by drawing ε and placing ΄ over it, but > U+03AD GREEK SMALL LETTER EPSILON WITH TONOS by simply using a glyph for it > in the font being used. This might be regarded as being of inferior > quality, but hardly as non-conforming. > > Yucca > > > > >

