Management has feedback on this.
I have gone through the list of functions and map function is the only one that can meet the requirements.If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field.
Or is there a less expensive function that I missed out?
By pre-compute some number, do you mean before the indexing at preparation stage, check the value of P_SupplierResponseRate. If the value = 3, specify 'boost="0.4"' for the field of the document?
Oh it is to reduce the score?! Not increase (multiply or add) the score by less than 1?BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;)
We do incremental indexing every half an hour on this collection. Average of 50K-100K documents during each indexing. Collection has 7+ milliion documents.You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too.
So the entire corpus does not get updated in every indexing.
I read up on termfreq function again. It returns the number of times the term appears in the field for that document. It does not really fit the requirements. Thank you for pointing it out.2> your problem statement has nothing to do with termfreq so why are you using it in the first place?
I should use map instead? Derek On 8/12/2020 9:48 pm, Erick Erickson wrote:
Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is done. Then spent a lot of time figuring out that the person reporting the problem hadn’t had coffee yet. Or the network was slow. Or…. If it does turn out to be the boosting (and IIRC the map function can be expensive), can you pre-compute some number of the boosts? Your requirements look like they can be computed at index time, then boost by just the value of the pre-computed field. BTW, boosts < 1.0 _reduce_ the score. I mention that just in case that’s a surprise ;) Of course that means that to change the boosting you need to re-index. You use termfreq, which changes of course, but 1> if your corpus is updated often enough, the termfreqs will be relatively stable. in that case you can pre-compute them too. 2> your problem statement has nothing to do with termfreq so why are you using it in the first place? Best, ErickOn Dec 8, 2020, at 12:46 AM, Radu Gheorghe <radu.gheor...@sematext.com> wrote: Hi Derek, Ah, then my reply was completely off :) I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure. Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production SupportOn 8 Dec 2020, at 06:17, Derek Poh <d...@globalsources.com> wrote: Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple function that can be use instead? Or craft the 'formula' it in a more efficient way? On 7/12/2020 10:05 pm, Radu Gheorghe wrote:Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judgements, you can experiment with boosts and see how the query results change. There are measures such as nDCG ( https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG ) that can help you measure that per query, and you can average this score across all your judgements to get an overall measure of how well you’re doing. Or even better, you can have something like Quaerite play with boost values for you: https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.com Solr and Elasticsearch Consulting, Training and Production SupportOn 7 Dec 2020, at 10:51, Derek Poh <d...@globalsources.com.INVALID> wrote: Hi I have added the following boosting requirements to the search query of a page. Feedback from monitoring team is that the overall response of the page has increased since then. I am trying to find out if the added boosting parameters (below) could have contributed to the increased. The boosting is working as per requirements. May I know if the implemented boosting parameters can be enhanced or optimized further? Hopefully to improve on the response time of the query and the page. Requirements: 1. If P_SupplierResponseRate is: a. 3, boost by 0.4 b. 2, boost by 0.2 2. If P_SupplierResponseTime is: a. 4, boost by 0.4 b. 3, boost by 0.2 3. If P_MWSScore is: a. between 80-100, boost by 1.6 b. between 60-79, boost by 0.8 4. If P_SupplierRanking is: a. 3, boost by 0.3 b. 4, boost by 0.6 c. 5, boost by 0.9 b. 6, boost by 1.2 Boosting parameters implemented: bf=map(P_SupplierResponseRate,3,3,0.4,0) bf=map(P_SupplierResponseRate,2,2,0.2,0) bf=map(P_SupplierResponseTime,4,4,0.4,0) bf=map(P_SupplierResponseTime,3,3,0.2,0) bf=map(P_MWSScore,80,100,1.6,0) bf=map(P_MWSScore,60,79,0.8,0) bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0)))) I am using Solr 7.7.2 ---------------------- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.---------------------- CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.
----------------------CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.