[ https://issues.apache.org/jira/browse/LUCENE-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Elschot resolved LUCENE-6894. ---------------------------------- Resolution: Not A Problem Lucene Fields: New,Patch Available (was: New) > Improve DISI.cost() by assuming independence for match probabilities > -------------------------------------------------------------------- > > Key: LUCENE-6894 > URL: https://issues.apache.org/jira/browse/LUCENE-6894 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Reporter: Paul Elschot > Priority: Minor > Attachments: LUCENE-6894.patch, LUCENE-6894.patch > > > The DocIdSetIterator.cost() method returns an estimation of the number of > matching docs. Currently conjunctions use the minimum cost, and disjunctions > use the sum of the costs, and both are too high. > The probability of a match is estimated by dividing available cost() by the > number of docs in a segment. > The conjunction probability is then the product of the inputs, and the > disjunction probability follows from De Morgan's rule: > "not (A and B)" is the same as "(not A) or (not B)" > with the probability for "not" computed as 1 minus the input probability. > The independence that is assumed is normally not there. However, the cost() > results are only used to order the input DISIs/Scorers for optimization, and > for that I expect this assumption to work nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org