Hi

Can someone help us understand how null values affect boosting.

Say we have field_1 (with boost ^10.1)  and field_2 (with boost ^9.1).
We search for foo. Document A has field_1(foo match) and field_2(empty) and 
Document B has field_2(foo match)  but no field_1.
As per our understanding the result should be Document A,Document B.
However what we are getting is Document B,Document A.

Below is a detailed description of the above problem with our business use case 
and configurations.

Use case : Promote documents as per following priority of fields ie. Keywords > 
meta description > Title > H1 > H2 >H3 > body content

For this we have indexed the above fields as
<field name="metatag.description" type="text_general" multiValued="false" 
indexed="true" stored="true"/>
<field name="metatag.keywords" type="text_general" multiValued="false" 
indexed="true" stored="true"/>
<field name="title" type="text_general" multiValued="false" indexed="true" 
stored="true"/>
<field name="h1" type="text_general" multiValued="true" indexed="true" 
stored="true"/>
<field name="h2" type="text_general" multiValued="true" indexed="true" 
stored="true"/>
<field name="h3" type="text_general" multiValued="true" indexed="true" 
stored="true"/>

and used the eDisMax query parser and set boosting as
<str name="defType">edismax</str>
<str name="qf">
        metatag.keywords^100.1 metatag.description^50.1 title^20.1 h1^4.7 
h2^3.6 h3^2.5 h4^1.4 id^0.01 _text_^0.001
</str>

The above is working fine for documents that have an entry for all fields. E.g. 
all pages have keywords, meta description and so on even though the entry might 
just be an empty string. So if the search contains pages only the results are 
coming fine as per expectation.

However for documents that don't have keywords ,e.g. all PDFs only have meta 
description ,title and _text_, results are skewed. PDFs are coming right at the 
top even though we have a page with the search term in keyword field.

To fix this anomaly we come up with the following boosting ( notice the very 
large boost values)
<str name="defType">edismax</str>
      <str name="qf">
      metatag.keywords^100000.1 metatag.description^7500.1 title^500.1 h1^40.7 
h2^25.6 h3^15.1 h4^5.4 h5^1.3 h6^1.2 _text_^1.0
      </str>

I can provide the query debug results for both configurations if required.

Thanks for any help in understanding this.

Reply via email to