[ 
https://issues.apache.org/jira/browse/OAK-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256665#comment-16256665
 ] 

Thomas Mueller commented on OAK-5449:
-------------------------------------

Actually, the costPerEntry shouldn't be used at all (should always be the 
same). I consider the current "trick" to reduce costPerEntry depending on the 
matching constraints a bug. Instead, the estimated *number of entries* in the 
index should be reduced. Currently, it doesn't matter all that much, but 
costPerEntry is not related to that (see the Javadocs of costPerEntry about 
what it is.)

An index that returns entries in correctly sorted order should be preferred 
over indexes that don't do that. However, this should be dealt with at the 
query engine level I think, and indexes shouldn't artificially return a lower 
cost. There is also another trade-off:

* index a returns few entries, but not sorted at all
* index b returns many entries, but sorted correctly

If a limit is used, then the query engine internally reduces the cost of index 
b (done in OAK-4887).

However, right now, if there is no limit set, we currently don't know which 
index should be preferred. The query engine currently picks index a. This is 
also described in OAK-4887 ("fastfirstrow" / "option (fast <n>)" in MS SQL 
Server).

> Cost calculation for one matching property restriction/sorting results in 
> selection of wrong index
> --------------------------------------------------------------------------------------------------
>
>                 Key: OAK-5449
>                 URL: https://issues.apache.org/jira/browse/OAK-5449
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>    Affects Versions: 1.4.10
>            Reporter: Volker Schmidt
>            Assignee: Chetan Mehrotra
>
> The method IndexPlanner.getPlanBuilder() for Lucene indexes contains at the 
> end an algorithm that calculates a costPerEntryFactor. If there is no 
> restriction property or sort property the factor will be the same like for 
> one restriction property or sort property. 
> If there are two indexes for which the cost is calculated, the cost must not 
> be the same. E.g. if there is a large result set that can be sorted with one 
> index but not with the other index, the index that supports sorting should be 
> used.
> The following code snippet:
> if (costPerEntryFactor == 0) {
>   costPerEntryFactor = 1;
> }
> should be changed to something like this (assuming costPerEntryFactor will be 
> changed to double value and will be rounded after division at the end of the 
> method):
> if (costPerEntryFactor == 1.0) {
>   // one matching restriction or sort property
>   costPerEntryFactor = 1.5;
> }
> else if (costPerEntryFactor == 0.0) {
>   // no matching restriction or sort property
>   costPerEntryFactor = 1.0;
> }
> Furthermore, since the found indexes are stored in a hashed collection, the 
> order of the index evaluation and the resulting index (when cost is the same 
> for more than one lucene based index) is non deterministic. This increases 
> the issue with the code above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to