Re: Edismax field boosting behavior for null values

Erick Erickson Wed, 11 May 2016 20:03:38 -0700

Fields that don't match for a particular document just don't contribute to the
score. The boost is multiplied into the score calculated for that field and
term. So if for doc1 the calculated score is 5 and you boost by 2, the result is
10. If doc2 has a calculated score of 20 and you boost by 1, its score is
higher.


For all the messy details, try adding
&debug=all&debug.explain.structured = true

Best,
Erick

On Wed, May 11, 2016 at 10:47 AM, Megha Bhandari <mbhanda...@sapient.com> wrote:
> Correcting typo in original post and making it a little clearer
>
> Hi
>
> Can someone help us understand how null values affect boosting.
>
> Say we have field_1 (with boost ^10.1)  and field_2 (with boost ^9.1).
> We search for foo.
> Document A :  field_1 : does not exist
>                           Field_2 = matches search term
> Document B: field_1 = matches search term
>                          Field_2 = empty string.
> As per our understanding the result should be Document B, Document A.
> However what we are getting is Document A,Document B.
>
> Below is a detailed description of the above problem with our business use 
> case and configurations.
>
> Use case : Promote documents as per following priority of fields ie. Keywords 
> > meta description > Title > H1 > H2 >H3 > body content
>
> For this we have indexed the above fields as
> <field name="metatag.description" type="text_general" multiValued="false" 
> indexed="true" stored="true"/>
> <field name="metatag.keywords" type="text_general" multiValued="false" 
> indexed="true" stored="true"/>
> <field name="title" type="text_general" multiValued="false" indexed="true" 
> stored="true"/>
> <field name="h1" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
> <field name="h2" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
> <field name="h3" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
>
> and used the eDisMax query parser and set boosting as
> <str name="defType">edismax</str>
> <str name="qf">
>         metatag.keywords^100.1 metatag.description^50.1 title^20.1 h1^4.7 
> h2^3.6 h3^2.5 h4^1.4 id^0.01 _text_^0.001
> </str>
>
> The above is working fine for documents that have an entry for all fields. 
> E.g. all pages have keywords, meta description and so on even though the 
> entry might just be an empty string. So if the search contains pages only the 
> results are coming fine as per expectation.
>
> However for documents that don't have keywords ,e.g. all PDFs only have meta 
> description ,title and _text_, results are skewed. PDFs are coming right at 
> the top even though we have a page with the search term in keyword field.
>
> To fix this anomaly we come up with the following boosting ( notice the very 
> large boost values)
> <str name="defType">edismax</str>
>       <str name="qf">
>       metatag.keywords^100000.1 metatag.description^7500.1 title^500.1 
> h1^40.7 h2^25.6 h3^15.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
>       </str>
>
> I can provide the query debug results for both configurations if required.
>
> Thanks for any help in understanding this.
>
>
> -----Original Message-----
> From: Megha Bhandari [mailto:mbhanda...@sapient.com]
> Sent: Wednesday, May 11, 2016 11:10 PM
> To: solr-user@lucene.apache.org
> Subject: Edismax field boosting behavior for null values
>
> Hi
>
> Can someone help us understand how null values affect boosting.
>
> Say we have field_1 (with boost ^10.1)  and field_2 (with boost ^9.1).
> We search for foo. Document A has field_1(foo match) and field_2(empty) and 
> Document B has field_2(foo match)  but no field_1.
> As per our understanding the result should be Document A,Document B.
> However what we are getting is Document B,Document A.
>
> Below is a detailed description of the above problem with our business use 
> case and configurations.
>
> Use case : Promote documents as per following priority of fields ie. Keywords 
> > meta description > Title > H1 > H2 >H3 > body content
>
> For this we have indexed the above fields as
> <field name="metatag.description" type="text_general" multiValued="false" 
> indexed="true" stored="true"/>
> <field name="metatag.keywords" type="text_general" multiValued="false" 
> indexed="true" stored="true"/>
> <field name="title" type="text_general" multiValued="false" indexed="true" 
> stored="true"/>
> <field name="h1" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
> <field name="h2" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
> <field name="h3" type="text_general" multiValued="true" indexed="true" 
> stored="true"/>
>
> and used the eDisMax query parser and set boosting as
> <str name="defType">edismax</str>
> <str name="qf">
>         metatag.keywords^100.1 metatag.description^50.1 title^20.1 h1^4.7 
> h2^3.6 h3^2.5 h4^1.4 id^0.01 _text_^0.001
> </str>
>
> The above is working fine for documents that have an entry for all fields. 
> E.g. all pages have keywords, meta description and so on even though the 
> entry might just be an empty string. So if the search contains pages only the 
> results are coming fine as per expectation.
>
> However for documents that don't have keywords ,e.g. all PDFs only have meta 
> description ,title and _text_, results are skewed. PDFs are coming right at 
> the top even though we have a page with the search term in keyword field.
>
> To fix this anomaly we come up with the following boosting ( notice the very 
> large boost values)
> <str name="defType">edismax</str>
>       <str name="qf">
>       metatag.keywords^100000.1 metatag.description^7500.1 title^500.1 
> h1^40.7 h2^25.6 h3^15.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
>       </str>
>
> I can provide the query debug results for both configurations if required.
>
> Thanks for any help in understanding this.
>

Re: Edismax field boosting behavior for null values

Reply via email to