> Peter> - normalise both data and search string - delete /
> ignore all
> Peter> characters with general category Mn
It worked well for us too. Someone mentionned to me once though that U+3099
and U+309A should be preserved in order not to change the meaning of words,
and we do so. But
> From: <[EMAIL PROTECTED]>
>
> > If you want to specify a search option of "ignore diacritics", would
there
> > be any reason not to do simply the following>
> >
> > - normalise both data and search string
> > - delete / ignore all characters with general category Mn
>
>
> Microsoft in its FoldS
Peter> - normalise both data and search string - delete / ignore all
Peter> characters with general category Mn
That's the way we've been doing it for a long time now. Normalization is a
bit expensive at times with very large corpora, but if you have the disk
space, it is a one-time cos
From: <[EMAIL PROTECTED]>
> If you want to specify a search option of "ignore diacritics", would there
> be any reason not to do simply the following>
>
> - normalise both data and search string
> - delete / ignore all characters with general category Mn
Microsoft in its FoldString implementat
4 matches
Mail list logo