RE: search ignoring diacritics

2001-05-21 Thread Yves Arrouye
> Peter> - normalise both data and search string - delete / > ignore all > Peter> characters with general category Mn It worked well for us too. Someone mentionned to me once though that U+3099 and U+309A should be preserved in order not to change the meaning of words, and we do so. But

Re: search ignoring diacritics

2001-05-21 Thread Michael (michka) Kaplan
> From: <[EMAIL PROTECTED]> > > > If you want to specify a search option of "ignore diacritics", would there > > be any reason not to do simply the following> > > > > - normalise both data and search string > > - delete / ignore all characters with general category Mn > > > Microsoft in its FoldS

Re: search ignoring diacritics

2001-05-21 Thread Mark Leisher
Peter> - normalise both data and search string - delete / ignore all Peter> characters with general category Mn That's the way we've been doing it for a long time now. Normalization is a bit expensive at times with very large corpora, but if you have the disk space, it is a one-time cos

Re: search ignoring diacritics

2001-05-21 Thread Michael (michka) Kaplan
From: <[EMAIL PROTECTED]> > If you want to specify a search option of "ignore diacritics", would there > be any reason not to do simply the following> > > - normalise both data and search string > - delete / ignore all characters with general category Mn Microsoft in its FoldString implementat