[ 
https://issues.apache.org/jira/browse/SOLR-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707793#action_12707793
 ] 

David Smiley commented on SOLR-1158:
------------------------------------

I just realized that not only would numDocs be affected, but so would docFreq.

I have a feeling that it may not be possible to enhance Solr to overcome this 
improvement suggestion because of performance constraints.  But I haven't taken 
a deep look to know this yet.  I'm curious what other Lucene/Solr experts think.

> Scoring, "numDocs" should be number after applying filters, not entire index
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1158
>                 URL: https://issues.apache.org/jira/browse/SOLR-1158
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4
>            Reporter: David Smiley
>            Priority: Minor
>
> I'd like to put different types of things to search for in my Solr index.  I 
> use a "type" field to discriminate between these types of things, and my "id" 
> primary key field incorporates the type (ex: "FooType:53") to ensure 
> uniqueness.  A problem I see with this approach is that the idf (inverse 
> document frequency) component of the score is based on the entire index and 
> not the type that I'm querying.  In particular "numDocs" given to the 
> Similarity.java implementation is the total number of documents in the index. 
>  I think it would be more accurate for numDocs to be the filtered number of 
> docs.  That is the number of docs after the filter queries are applied.
> The only issue I see with this which may or may not be a problem is that the 
> scores (and thus potentially result ordering if sorting by score)  would 
> change depending on which filters are applied.  That could be 
> counter-intuitive in a faceting UI.  Perhaps only a certain filter or filters 
> could be marked as lowering numDocs for scoring.  Such a configuration choice 
> strikes me as belonging in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to