> > Anyway, John J, what code are we talking about that has to 
> work from 
> > the positions of the combining marks back to the underlying 
> > representation? Are you talking about OCR?
> >
> 
> No, the issue is more how to start from a base form and work 
> forward to 
> encompass the whole series of characters which need to be treated "as 
> one" in certain processes, which can include cursor movement, hit 
> testing, display, line breaking, collation, normalization.

Collation isn't really based on combining sequences (even though UTS 10
specifies a certain "spanning" over non-blocking (combining)
characters).
Note in particular the following entry in the CTT (and with different
syntax in the UTS 10 tables):
<U0E4D_0E32> <S0E33>;<BASE>;<MIN>;<U0E33> % THAI CHARACTER SARA AM
(and a similar one for Lao). This is a collation entry for a
"contraction" of a combining mark followed(!) by (formally) a
base character. (I'm not really sure what the true logical sequence
would be, though.)

        /kent k


Reply via email to