Re: Possible issue in edismax?

Felipe Lahti Wed, 30 Jan 2013 07:53:50 -0800

Hi Sandeep,

Quick answer is that not only the boost that you define in your
requestHandler is taken to calculate the score of each document. There are
others factors that contribute to score calculation. You can take a look
here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
using debugQuery=true the score calculation for each document returned.


Let me know you need something else.



On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry <sanmes...@gmail.com> wrote:

> Hi All,
>
> I'm facing an issue in relevancy calculation by dismax query parser.
> The boost factor applied does not work as expected in certain cases when
> the keyword is generic and by generic I mean, if the keyword is appearing
> many times in the document as well as in the index.
>
> I have parser configuration as below:
>
> <requestHandler name="querydismax" class="solr.SearchHandler" >
>         <lst name="defaults">
>             <str name="defType">edismax</str>
>             <str name="echoParams">explicit</str>
>             <float name="tie">0.01</float>
>             <str name="qf">series_title^500 title^100 description^15
> contribution</str>
>             <str name="pf">series_title^200</str>
>             <int name="ps">0</int>
>             <str name="q.alt">*:*</str>
>         </lst>
> </requestHandler>
>
> As you can see above, I'd expect the documents containing the matches for
> series title should rank higher than the ones in contribution.
>
> This works well, if I type in a query like 'wonderworld' which is a less
> occurring term and the series titles rank higher. But, if I type in a
> keyword like 'news' which is the most common term in the index, I get hits
> in contributions even though I have lots of documents having word news in
> series title.
>
> The field definition is as below:
>
> <field name="series_title" type="text_wc" indexed="true" stored="true"
> multiValued="false" />
> <field name="title" type="text_wc" indexed="true" stored="true"
> multiValued="false" />
> <field name="description" type="text_wc" indexed="true" stored="true"
> multiValued="false" />
> <field name="contribution" type="text" indexed="true" stored="true"
> multiValued="true" />
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> compressThreshold="10">
>             <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>             <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>         </fieldType>
>
> <fieldType name="text_wc" class="solr.TextField" positionIncrementGap="100"
> >
>             <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>             <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>  </fieldType>
>
> I have tried debugging and when I use query term news, I see that matches
> for contributions are ranked higher than series title. The parsed queries
> look like below:
> (Note that I have edited the query as in reality I have lot of fields that
> are searchable and I have only mentioned the fields containing text data -
> rest all contain uuids)
>
> <str name="parsedquery">
> (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
> () () () () () () () () () () () () () () () () () () () ())/no_coord
> </str>
> <str name="parsedquery_toString">
> +(description:news^15 | title:news^100.0 | contributions:news |
> series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
> () () () () () () () () () () () () ()
>
>
> Could you guide me in right direction please?
>
> Many Thanks,
> Sandeep
>



-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre

Re: Possible issue in edismax?

Reply via email to