2010/7/27 Oleg Burlaca <o...@burlaca.com>

> Actually the situation with Немцов из ок,
> I've just checked how Yandex works with Немцов and Немцова:
> http://nano.yandex.ru/project/inflect/
>
> I think there are two solutions:
> a) manually search for both Немцов and then Немцова
> b) use wildcard query: Немцов*
>

Well, here is one idea of a more general solution.
The problem with "protected words" is you must have a complete list.

One idea would be to add a filter that protects any words from stemming that
match a regular expression:
In english maybe someone wants to avoid any capitalized words to reduce
trouble: [A-Z].*
in your case then some pattern like [A-Я].*ов might prevent problems.


> Robert, thanks for the RussianLightStemFilterFactory info,
> I've found this page
> http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
> that somehow describes it. Where can I read more about
> RussianLightStemFilterFactory ?
>
>
Here is the link:
http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf


> Regards,
> Oleg
>
> 2010/7/27 Oleg Burlaca <o...@burlaca.com>
>
> > A similar word is Немцов.
> > The strange thing is that searching for "Немцова" will not find documents
> > containing "Немцов"
> >
> > Немцова: 14 articles
> >
> >
> http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
> >
> > Немцов: 74 articles
> >
> >
> http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
> >
> >
> >
> >
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to