RE: Unicode Search Engines

Marco Cimarosti Wed, 20 Feb 2002 11:32:34 -0800

John Cowan wrote:
> Documents not in UTF-* are normalized by definition, unless it is
> *impossible* to convert them to normalized Unicode (typically
> because they contain characters not yet present in Unicode).


Is that true for all encodings?

E.g., ISCII 0xCF + 0xE9 (LETTER RA + SIGN NUKTA) corresponds to Unicode
U0930 + U093C (DEVANAGARI LETTER RA + DEVANAGARI SIGN NUKTA), which is not
NFC: it should be U0931 (DEVANAGARI LETTER RRA).

What should the recipient to when it receives such an ISCII sequence? Refuse
it because it is not normalized (ISCII itself also contains 0xD0, LETTER
RRA), or "fix" it while converting it to Unicode?

_ Marco

RE: Unicode Search Engines

Reply via email to