Re: [SUSPICIOUS] Re: Best Practises around relevance tuning per query

David Hastings Tue, 18 Feb 2020 21:32:05 -0800

I don’t think anyone is responding because it’s too focused of a use case, 
where you just simply have to figure out an alternative on your own.


> On Feb 19, 2020, at 12:28 AM, Ashwin Ramesh <ash...@canva.com.invalid> wrote:
> 
> ping on this :)
> 
>> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote:
>> 
>> Hi,
>> 
>> We are in the process of applying a scoring model to our search results.
>> In particular, we would like to add scores for documents per query and user
>> context.
>> 
>> For example, we want to have a score from 500 to 1 for the top 500
>> documents for the query “dog” for users who speak US English.
>> 
>> We believe it becomes infeasible to store these scores in Solr because we
>> want to update the scores regularly, and the number of scores increases
>> rapidly with increased user attributes.
>> 
>> One solution we explored was to store these scores in a secondary data
>> store, and use this at Solr query time with a boost function such as:
>> 
>> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
>> mul(termfreq(id,'ID-500'),1)`
>> 
>> We have over a hundred thousand documents in one Solr collection, and
>> about fifty million in another Solr collection. We have some queries for
>> which roughly 80% of the results match, although this is an edge case. We
>> wanted to know the worst case performance, so we tested with such a query.
>> For both of these collections we found the a message similar to the
>> following in the Solr cloud logs (tested on a laptop):
>> 
>> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
>> 
>> We then tried using the following boost, which seemed simpler:
>> 
>> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
>> 
>> We then saw the following in the Solr cloud logs:
>> 
>> `The request took too long to iterate over terms.`
>> 
>> All responses above took over 5000 milliseconds to return.
>> 
>> We are considering Solr’s re-ranker, but I don’t know how we would use
>> this without pushing all the query-context-document scores to Solr.
>> 
>> 
>> The alternative solution that we are currently considering involves
>> invoking multiple solr queries.
>> 
>> This means we would make a request to solr to fetch the top N results (id,
>> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
>> 
>> Another request would be made using a filter query with a set of doc ids
>> that we know are high value for the user’s query. E.g. q=*:*,
>> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
>> 
>> We would then do a reranking phase in our service layer.
>> 
>> Do you have any suggestions for known patterns of how we can store and
>> retrieve scores per user context and query?
>> 
>> Regards,
>> Ash & Spirit.
>> 
> 
> -- 
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're 
> hiring. Apply here! <https://about.canva.com/careers/>
> 
> <https://twitter.com/canva> <https://facebook.com/canva> 
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
> <https://instagram.com/canva>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: [SUSPICIOUS] Re: Best Practises around relevance tuning per query

Reply via email to