<[EMAIL PROTECTED]>
I hope that the claim of "multiple UTF-8 representations" 
does indeed refer 
to glyphs, in the sense that Unicode contains both 
precomposed characters and 
separable elements, halfwidth and fullwidth ASCII variants, 
etc.  I hope it 
does *not* refer to the nonconformant practice of 
representing Unicode 
characters with "non-shortest" UTF-8 sequences.  Instances of 
that are not 
the fault of UTF-8.
<[EMAIL PROTECTED]>

        Is there an existing set of recommendations for dealing with this
problem (multiple legal compositions) in search and search-like
applications?  Specifically, if there are multiple legal ways to represent a
character, how should the character be stored, should search text be
preprocessede, etc.?  Pointers, anyone?


        TiA,

/|/|ike

Reply via email to