Re: Boost non stemmed keywords (KStem filter)

Walter Underwood Thu, 19 Nov 2015 15:37:34 -0800

That is the approach I’ve been using for years. Simple and effective.

It probably makes the index bigger. Make sure that only one of the fields is 
stored, because the stored text will be exactly the same in both.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 19, 2015, at 1:47 PM, Ahmet Arslan <iori...@yahoo.com.INVALID> wrote:
> 
> Hi,
> 
> I wonder about using two fields (text_stem and text_no_stem) and applying 
> query time boost
> text_stem^0.3 text_no_stem^0.6
> 
> What is the advantage of keyword repeat/paylad approach compared with this 
> one?
> 
> Ahmet
> 
> 
> On Thursday, November 19, 2015 10:24 PM, Markus Jelsma 
> <markus.jel...@openindex.io> wrote:
> Hello Jan - i have no code i can show but we are using it to power our search 
> servers. You are correct, you need to deal with payloads at query time as 
> well. This means you need a custom similarity but also customize your query 
> parser to rewrite queries to payload supported types. This is also not very 
> hard, some ancient examples can still be found on the web. But you also need 
> to copy over existing TokenFilters to emit payloads whenever you want. 
> Overriding TokenFilters is usually impossible due to crazy private members (i 
> still cannot figure out why so many parts are private..)
> 
> It can be very powerful, especially if you do not use payloads to contain 
> just a score. But instead to carry a WORD_TYPE, such as stemmed, unstemmed 
> but also stopwords, acronyms, compound and subwords, headings or normal text 
> but also NER types (which we don't have yet). For this to work you just need 
> to treat the payload as a bitset for different types so you can have really 
> tuneable scoring at query time via your similarity. Unfortunately, payloads 
> can only carry a relative small amount of bits :)
> 
> M.
> 
> -----Original message-----
>> From:Jan Høydahl <jan....@cominvent.com>
>> Sent: Thursday 19th November 2015 14:30
>> To: solr-user@lucene.apache.org
>> Subject: Re: Boost non stemmed keywords (KStem filter)
>> 
>> Do you have a concept code for this? Don’t you also have to hack your query 
>> parser, e.g. dismax, to use other Query objects supporting payloads?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 18. nov. 2015 kl. 22.24 skrev Markus Jelsma <markus.jel...@openindex.io>:
>>> 
>>> Hi - easiest approach is to use KeywordRepeatFilter and 
>>> RemoveDuplicatesTokenFilter. This creates a slightly higher IDF for 
>>> unstemmed words which might be just enough in your case. We found it not to 
>>> be enough, so we also attach payloads to signify stemmed words amongst 
>>> others. This allows you to decrease score for stemmed words at query time 
>>> via your similarity impl.
>>> 
>>> M.
>>> 
>>> 
>>> 
>>> -----Original message-----
>>>> From:bbarani <bbar...@gmail.com>
>>>> Sent: Wednesday 18th November 2015 22:07
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Boost non stemmed keywords (KStem filter)
>>>> 
>>>> Hi,
>>>> 
>>>> I am using KStem factory for stemming. This stemmer converts 'france to
>>>> french', 'chinese to china' etc.. I am good with this stemming but I am
>>>> trying to boost the results that contain the original term compared to the
>>>> stemmed terms. Is this possible?
>>>> 
>>>> Thanks,
>>>> Learner
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: 
>>>> http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>> 
>>

Re: Boost non stemmed keywords (KStem filter)

Reply via email to