On Fri, 15 May 2015 22:09:13 +0200 Philippe Verdy <[email protected]> wrote:
> 2015-05-15 9:10 GMT+02:00 Richard Wordingham < > [email protected]>: > This is because you don't understand the issue ! > > Now, a program to check whether a trace matching > > {\u0323|\u0302)* matches (\u0323\u0302)* is very simple. It just > > counts the number of times \u0323 occurs and the number of times > > \u0302 occurs, and returns whether they are equal. > This is wrong. \0323\0323\0302\0302 and \0323\0302\0323\0302 would > pass your counting test (which does not work in a FSA) but they are > NOT canonically equivalent because the identical combining characters > are blocking each other (so arbitrary ordering is not possible). TUS7.0: D108 Reorderable pair: Two adjacent characters A and B in a coded character sequence <A, B> are a Reorderable Pair if and only if ccc(A) > ccc(B) > 0. Now, ccc(U+0302) = 230 > 220 = ccc(U+0323) > 0, so (U+0302, U+0303) is a reorderable pair. TUS7.0: D109 Canonical Ordering Algorithm: In a decomposed character sequence D, exchange the positions of the characters in each Reorderable Pair until the sequence contains no more Reorderable Pairs. The normalisation process on <U+0323, U+0302, U+0323, U+0302> first replaces it by <U+0323, U+0323, U+0302, U+0302>. There are then no more reorderable pairs, so that has reduced it to form NFD. Therefore <U+0323, U+0323, U+0302, U+0302> and <U+0323, U+0302, U+0323, U+0302> *are* canonically equivalent. > So the FSA implementation is perfectly possible, for canonical > equivalences only... I now vaguely follow your argument, but it depends on the erroneous claim that <U+0323, U+0323, U+0302, U+0302> and <U+0323, U+0302, U+0323, U+0302> are not canonically equivalent. > This evidently does not work if you are > performing regexp searches using looser equivalences, such as > compatibility equivalence. I completely fail to understand this remark; it makes no difference whether one uses canonical or compatibility equivalence. Richard.

