Fine. It can’t be done. If it was easy, Solr/Lucene would already have the 
feature, right?

Solr is a vector-space engine. Some early engines (Verity VDK) were 
probabilistic engines. Those do give an absolute estimate of the relevance of 
each hit. Unfortunately, the relevance of results is just not as good as 
vector-space engines. So, probabilistic engines are mostly dead.

But, “you don’t want to do it” is very good advice. Instead of trying to reduce 
bad hits, work on increasing good hits. It is really hard, sometimes not 
possible, to optimize both. Increasing the good hits makes your customers 
happy. Reducing the bad hits makes your UX team happy.

Here is a process. Start collecting the clicks on the search results page (SRP) 
with each query. Look at queries that have below average clickthrough. See if 
those can be combined into categories, then address each category.

Some categories that I have used:

* One word or two? “babysitter”, “baby-sitter”, and “baby sitter” are all 
valid. Use synonyms or shingles (and maybe the word delimiter filter) to match 
these.

* Misspellings. These should be about 10% of queries. Use fuzzy matching. I 
recommend the patch in SOLR-629.

* Alternate vocabulary. You sell a “laptop”, but people call it a “notebook”. 
People search for “kids movies”, but your movie genre is “Children and Family”. 
Use synonyms.

* Missing content. People can’t find anything about beach parking because there 
isn’t a page about that. Instead, there are scraps of info about beach parking 
in multiple other pages. Fix the content.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 12, 2017, at 11:44 AM, David Kramer <david.kra...@shoebuy.com> wrote:
> 
> The idea is to not return poorly matching results, not to limit the number of 
> results returned.  One query may have hundreds of excellent matches and 
> another query may have 7. So cutting off by the number of results is trivial 
> but not useful.
> 
> Again, we are not doing this for performance reasons. We’re doing this 
> because we don’t want to show products that are not very relevant to the 
> search terms specified by the user for UX reasons.
> 
> I had hoped that the responses would have been more focused on “it’ can’t be 
> done” or “here’s how to do it” than “you don’t want to do it”.   I’m still 
> left not knowing if it’s even possible. The one concrete answer of using 
> frange doesn’t help as referencing score in either the q or the fq produces 
> an “undefined field” error.
> 
> Thanks.
> 
> On 4/11/17, 8:59 AM, "Dorian Hoxha" <dorian.ho...@gmail.com> wrote:
> 
>    Can't the filter be used in cases when you're paginating in
>    sharded-scenario ?
>    So if you do limit=10, offset=10, each shard will return 20 docs ?
>    While if you do limit=10, _score<=last_page.min_score, then each shard will
>    return 10 docs ? (they will still score all docs, but merging will be
>    faster)
> 
>    Makes sense ?
> 
>    On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti 
> <a.benede...@sease.io
>> wrote:
> 
>> Can i ask what is the final requirement here ?
>> What are you trying to do ?
>> - just display less results ?
>> you can easily do at search client time, cutting after a certain amount
>> - make search faster returning less results ?
>> This is not going to work, as you need to score all of them as Erick
>> explained.
>> 
>> Function query ( as Mikhail specified) will run on a per document basis (
>> if
>> I am correct), so if your idea was to speed up the things, this is not
>> going
>> to work.
>> 
>> It makes much more sense to refine your system to improve relevancy if your
>> concern is to have more relevant docs.
>> If your concern is just to not show that many pages, you can limit that
>> client side.
>> 
>> 
>> 
>> 
>> 
>> 
>> -----
>> ---------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Filtering-results-by-minimum-relevancy-score-
>> tp4329180p4329295.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 

Reply via email to