Re: optimize boosting parameters

Derek Poh Tue, 08 Dec 2020 22:51:19 -0800

We monitor the response time (pingdom) of the page that uses theseboosting parameters. Since the addition of these boosting parameters andan additional field to search on (which I will create a thread on it inthe mailing list), the page average response time has increased by 1-2seconds.

Management has feedback on this.

If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field.

I have gone through the list of functions and map function is the onlyone that can meet the requirements.

Or is there a less expensive function that I missed out?

By pre-compute some number, do you mean before the indexing atpreparation stage, check the value of P_SupplierResponseRate. If thevalue = 3, specify 'boost="0.4"' for the field of the document?

BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)

Oh it is to reduce the score?! Not increase (multiply or add) the scoreby less than 1?

  You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively 
stable.
       in that case you can pre-compute them too.

We do incremental indexing every half an hour on this collection.Average of 50K-100K documents during each indexing. Collection has 7+milliion documents.

So the entire corpus does not get updated in every indexing.

2> your problem statement has nothing to do with termfreq so why are you
      using it in the first place?

I read up on termfreq function again. It returns the number of times theterm appears in the field for that document. It does not really fit therequirements. Thank you for pointing it out.

I should use map instead?

Derek

On 8/12/2020 9:48 pm, Erick Erickson wrote:

Before worrying about it too much, exactly _how_ much has
the performance changed?

I’ve just been in too many situations where there’s
no objective measure of performance before and after, just
someone saying “it seems slower” and had those performance
changes disappear when a rigorous test is done. Then spent
a lot of time figuring out that the person reporting the
problem hadn’t had coffee yet. Or the network was slow.
Or….

If it does turn out to be the boosting (and IIRC the
map function can be expensive), can you pre-compute some
number of the boosts? Your requirements look
like they can be computed at index time, then boost
by just the value of the pre-computed field. BTW, boosts < 1.0
_reduce_ the score. I mention that just in case that’s a surprise ;)
Of course that means that to change the boosting you need
to re-index.

  You use termfreq, which changes of course, but
1> if your corpus is updated often enough, the termfreqs will be relatively 
stable.
       in that case you can pre-compute them too.


2> your problem statement has nothing to do with termfreq so why are you
      using it in the first place?

Best,
Erick

On Dec 8, 2020, at 12:46 AM, Radu Gheorghe <radu.gheor...@sematext.com> wrote:

Hi Derek,

Ah, then my reply was completely off :)

I don’t really see a better way. Maybe other than changing termfreq to field, 
if the numeric field has docValues? That may be faster, but I don’t know for 
sure.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

On 8 Dec 2020, at 06:17, Derek Poh <d...@globalsources.com> wrote:

Hi Radu

Apologies for not making myself clear.

I would like to know if there is a more simple or efficient way to craft the 
boosting parameters based on the requirements.

For example, I am using 'if', 'map' and 'termfreq' functions in the bf 
parameters.

Is there a more efficient or simple function that can be use instead? Or craft 
the 'formula' it in a more efficient way?

On 7/12/2020 10:05 pm, Radu Gheorghe wrote:

Hi Derek,

It’s hard to tell whether your boosts can be made better without knowing your 
data and what users expect of it. Which is a problem in itself.

I would suggest gathering judgements, like if a user queries for X, what doc 
IDs do you expect to get back?

Once you have enough of these judgements, you can experiment with boosts and 
see how the query results change. There are measures such as nDCG (
https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
) that can help you measure that per query, and you can average this score 
across all your judgements to get an overall measure of how well you’re doing.

Or even better, you can have something like Quaerite play with boost values for 
you:

https://github.com/tballison/quaerite/blob/main/quaerite-examples/README.md#genetic-algorithms-ga-runga


Best regards,
Radu
--
Sematext Cloud - Full Stack Observability -
https://sematext.com

Solr and Elasticsearch Consulting, Training and Production Support

On 7 Dec 2020, at 10:51, Derek Poh <d...@globalsources.com.INVALID>
wrote:

Hi

I have added the following boosting requirements to the search query of a page. 
Feedback from monitoring team is that the overall response of the page has 
increased since then.
I am trying to find out if the added boosting parameters (below) could have 
contributed to the increased.

The boosting is working as per requirements.

May I know if the implemented boosting parameters can be enhanced or optimized 
further?
Hopefully to improve on the response time of the query and the page.

Requirements:
1. If P_SupplierResponseRate is:
   a. 3, boost by 0.4
   b. 2, boost by 0.2

2. If P_SupplierResponseTime is:
   a. 4, boost by 0.4
   b. 3, boost by 0.2

3. If P_MWSScore is:
   a. between 80-100, boost by 1.6
   b. between 60-79, boost by 0.8

4. If P_SupplierRanking is:
   a. 3, boost by 0.3
   b. 4, boost by 0.6
   c. 5, boost by 0.9
   b. 6, boost by 1.2

Boosting parameters implemented:
bf=map(P_SupplierResponseRate,3,3,0.4,0)
bf=map(P_SupplierResponseRate,2,2,0.2,0)

bf=map(P_SupplierResponseTime,4,4,0.4,0)
bf=map(P_SupplierResponseTime,3,3,0.2,0)

bf=map(P_MWSScore,80,100,1.6,0)
bf=map(P_MWSScore,60,79,0.8,0)

bf=if(termfreq(P_SupplierRanking,3),0.3,if(termfreq(P_SupplierRanking,4),0.6,if(termfreq(P_SupplierRanking,5),0.9,if(termfreq(P_SupplierRanking,6),1.2,0))))


I am using Solr 7.7.2

----------------------
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.





----------------------
CONFIDENTIALITY NOTICE

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



----------------------

CONFIDENTIALITY NOTICEThis e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: optimize boosting parameters

Reply via email to