I don’t think anyone is responding because it’s too focused of a use case, where you just simply have to figure out an alternative on your own.
> On Feb 19, 2020, at 12:28 AM, Ashwin Ramesh <ash...@canva.com.invalid> wrote: > > ping on this :) > >> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote: >> >> Hi, >> >> We are in the process of applying a scoring model to our search results. >> In particular, we would like to add scores for documents per query and user >> context. >> >> For example, we want to have a score from 500 to 1 for the top 500 >> documents for the query “dog” for users who speak US English. >> >> We believe it becomes infeasible to store these scores in Solr because we >> want to update the scores regularly, and the number of scores increases >> rapidly with increased user attributes. >> >> One solution we explored was to store these scores in a secondary data >> store, and use this at Solr query time with a boost function such as: >> >> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) … >> mul(termfreq(id,'ID-500'),1)` >> >> We have over a hundred thousand documents in one Solr collection, and >> about fifty million in another Solr collection. We have some queries for >> which roughly 80% of the results match, although this is an edge case. We >> wanted to know the worst case performance, so we tested with such a query. >> For both of these collections we found the a message similar to the >> following in the Solr cloud logs (tested on a laptop): >> >> Elapsed time: 5020. Exceeded allowed search time: 5000 ms. >> >> We then tried using the following boost, which seemed simpler: >> >> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)` >> >> We then saw the following in the Solr cloud logs: >> >> `The request took too long to iterate over terms.` >> >> All responses above took over 5000 milliseconds to return. >> >> We are considering Solr’s re-ranker, but I don’t know how we would use >> this without pushing all the query-context-document scores to Solr. >> >> >> The alternative solution that we are currently considering involves >> invoking multiple solr queries. >> >> This means we would make a request to solr to fetch the top N results (id, >> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N. >> >> Another request would be made using a filter query with a set of doc ids >> that we know are high value for the user’s query. E.g. q=*:*, >> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N. >> >> We would then do a reranking phase in our service layer. >> >> Do you have any suggestions for known patterns of how we can store and >> retrieve scores per user context and query? >> >> Regards, >> Ash & Spirit. >> > > -- > ** > ** <https://www.canva.com/>Empowering the world to design > Also, we're > hiring. Apply here! <https://about.canva.com/careers/> > > <https://twitter.com/canva> <https://facebook.com/canva> > <https://au.linkedin.com/company/canva> <https://twitter.com/canva> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > <https://instagram.com/canva> > > > > > > > > > >