Peter> - normalise both data and search string - delete / ignore all
    Peter> characters with general category Mn

That's the way we've been doing it for a long time now.  Normalization is a
bit expensive at times with very large corpora, but if you have the disk
space, it is a one-time cost.
-----------------------------------------------------------------------------
Mark Leisher                      Times are bad.  Children no longer obey
Computing Research Lab            their parents, and everyone is writing
New Mexico State University       a book.
Box 30001, Dept. 3CRL                -- Marcus Tullius Cicero
Las Cruces, NM  88003

Reply via email to