[
https://issues.apache.org/jira/browse/OAK-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256665#comment-16256665
]
Thomas Mueller commented on OAK-5449:
-------------------------------------
Actually, the costPerEntry shouldn't be used at all (should always be the
same). I consider the current "trick" to reduce costPerEntry depending on the
matching constraints a bug. Instead, the estimated *number of entries* in the
index should be reduced. Currently, it doesn't matter all that much, but
costPerEntry is not related to that (see the Javadocs of costPerEntry about
what it is.)
An index that returns entries in correctly sorted order should be preferred
over indexes that don't do that. However, this should be dealt with at the
query engine level I think, and indexes shouldn't artificially return a lower
cost. There is also another trade-off:
* index a returns few entries, but not sorted at all
* index b returns many entries, but sorted correctly
If a limit is used, then the query engine internally reduces the cost of index
b (done in OAK-4887).
However, right now, if there is no limit set, we currently don't know which
index should be preferred. The query engine currently picks index a. This is
also described in OAK-4887 ("fastfirstrow" / "option (fast <n>)" in MS SQL
Server).
> Cost calculation for one matching property restriction/sorting results in
> selection of wrong index
> --------------------------------------------------------------------------------------------------
>
> Key: OAK-5449
> URL: https://issues.apache.org/jira/browse/OAK-5449
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Affects Versions: 1.4.10
> Reporter: Volker Schmidt
> Assignee: Chetan Mehrotra
>
> The method IndexPlanner.getPlanBuilder() for Lucene indexes contains at the
> end an algorithm that calculates a costPerEntryFactor. If there is no
> restriction property or sort property the factor will be the same like for
> one restriction property or sort property.
> If there are two indexes for which the cost is calculated, the cost must not
> be the same. E.g. if there is a large result set that can be sorted with one
> index but not with the other index, the index that supports sorting should be
> used.
> The following code snippet:
> if (costPerEntryFactor == 0) {
> costPerEntryFactor = 1;
> }
> should be changed to something like this (assuming costPerEntryFactor will be
> changed to double value and will be rounded after division at the end of the
> method):
> if (costPerEntryFactor == 1.0) {
> // one matching restriction or sort property
> costPerEntryFactor = 1.5;
> }
> else if (costPerEntryFactor == 0.0) {
> // no matching restriction or sort property
> costPerEntryFactor = 1.0;
> }
> Furthermore, since the found indexes are stored in a hashed collection, the
> order of the index evaluation and the resulting index (when cost is the same
> for more than one lucene based index) is non deterministic. This increases
> the issue with the code above.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)