[
https://issues.apache.org/jira/browse/SOLR-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707793#action_12707793
]
David Smiley commented on SOLR-1158:
------------------------------------
I just realized that not only would numDocs be affected, but so would docFreq.
I have a feeling that it may not be possible to enhance Solr to overcome this
improvement suggestion because of performance constraints. But I haven't taken
a deep look to know this yet. I'm curious what other Lucene/Solr experts think.
> Scoring, "numDocs" should be number after applying filters, not entire index
> ----------------------------------------------------------------------------
>
> Key: SOLR-1158
> URL: https://issues.apache.org/jira/browse/SOLR-1158
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 1.4
> Reporter: David Smiley
> Priority: Minor
>
> I'd like to put different types of things to search for in my Solr index. I
> use a "type" field to discriminate between these types of things, and my "id"
> primary key field incorporates the type (ex: "FooType:53") to ensure
> uniqueness. A problem I see with this approach is that the idf (inverse
> document frequency) component of the score is based on the entire index and
> not the type that I'm querying. In particular "numDocs" given to the
> Similarity.java implementation is the total number of documents in the index.
> I think it would be more accurate for numDocs to be the filtered number of
> docs. That is the number of docs after the filter queries are applied.
> The only issue I see with this which may or may not be a problem is that the
> scores (and thus potentially result ordering if sorting by score) would
> change depending on which filters are applied. That could be
> counter-intuitive in a faceting UI. Perhaps only a certain filter or filters
> could be marked as lowering numDocs for scoring. Such a configuration choice
> strikes me as belonging in the schema.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.