RE: Boost non stemmed keywords (KStem filter)

Markus Jelsma Thu, 19 Nov 2015 12:25:23 -0800

Hello Jan - i have no code i can show but we are using it to power our search 
servers. You are correct, you need to deal with payloads at query time as well. 
This means you need a custom similarity but also customize your query parser to 
rewrite queries to payload supported types. This is also not very hard, some 
ancient examples can still be found on the web. But you also need to copy over 
existing TokenFilters to emit payloads whenever you want. Overriding 
TokenFilters is usually impossible due to crazy private members (i still cannot 
figure out why so many parts are private..)


It can be very powerful, especially if you do not use payloads to contain just 
a score. But instead to carry a WORD_TYPE, such as stemmed, unstemmed but also 
stopwords, acronyms, compound and subwords, headings or normal text but also 
NER types (which we don't have yet). For this to work you just need to treat 
the payload as a bitset for different types so you can have really tuneable 
scoring at query time via your similarity. Unfortunately, payloads can only 
carry a relative small amount of bits :)

M.

-----Original message-----
> From:Jan Høydahl <jan....@cominvent.com>
> Sent: Thursday 19th November 2015 14:30
> To: solr-user@lucene.apache.org
> Subject: Re: Boost non stemmed keywords (KStem filter)
> 
> Do you have a concept code for this? Don’t you also have to hack your query 
> parser, e.g. dismax, to use other Query objects supporting payloads?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 18. nov. 2015 kl. 22.24 skrev Markus Jelsma <markus.jel...@openindex.io>:
> > 
> > Hi - easiest approach is to use KeywordRepeatFilter and 
> > RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for 
> > unstemmed words which might be just enough in your case. We found it not to 
> > be enough, so we also attach payloads to signify stemmed words amongst 
> > others. This allows you to decrease score for stemmed words at query time 
> > via your similarity impl.
> > 
> > M.
> > 
> > 
> > 
> > -----Original message-----
> >> From:bbarani <bbar...@gmail.com>
> >> Sent: Wednesday 18th November 2015 22:07
> >> To: solr-user@lucene.apache.org
> >> Subject: Boost non stemmed keywords (KStem filter)
> >> 
> >> Hi,
> >> 
> >> I am using KStem factory for stemming. This stemmer converts 'france to
> >> french', 'chinese to china' etc.. I am good with this stemming but I am
> >> trying to boost the results that contain the original term compared to the
> >> stemmed terms. Is this possible?
> >> 
> >> Thanks,
> >> Learner
> >> 
> >> 
> >> 
> >> 
> >> --
> >> View this message in context: 
> >> http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> 
> 
>

RE: Boost non stemmed keywords (KStem filter)

Reply via email to