Re: [CLucene-dev] Wildcard query on a Russian text is not working for me

2019-07-25 Thread Kostka Bořivoj
Hi, I’m quite sure standard tokenizer doesn’t support Unicode combining characters. The question is, how to process them. I think for Russian language the best way is simply to skip this character (create token text without this character), because it is just used to show, where is the accent

Re: [CLucene-dev] Wildcard query on a Russian text is not working for me

2019-07-25 Thread Tamás Dömők
Hi, yes, I ended up removing the accents before processing it with CLucene. https://unicode.org/reports/tr15/#Normalization_Forms_Table QString unaccent(const QString ) { const QString normalized = s.normalized(QString::NormalizationForm_D); QString out;