Thomas Mueller created OAK-4323:
-----------------------------------

             Summary: Query engine: index cost formula incorrect when using 
"limit"
                 Key: OAK-4323
                 URL: https://issues.apache.org/jira/browse/OAK-4323
             Project: Jackrabbit Oak
          Issue Type: Improvement
            Reporter: Thomas Mueller


As described in OAK-2081, the cost formula currently used in the query engine 
is not correct if "limit" is used, because it doesn't account for false 
positives. 

Example: Let's say there are two indexes:

* color: 10000 nodes with color=red, but a bit slower (lets say a remote 
index), cost per entry is 1.5.
* size: 20000 nodes with size=M, but a bit faster (lets say a local index), 
cost per entry is 1.

Without limit, the index for "color" should be used as 10000 * 1.5 = 15000 is 
lower than 20000 * 1 = 20000.

With limit=100, then we could calculate as follows: there are at most 10000 
entries (according to index "color"), so the false positive rate of the "size" 
index is at least 50%. So cost of "color" is 100 * 1.5 = 150. Cost of "size" is 
100 * 1 = 100, but with false positive rate of 50%, so cost is actually 200. 
Therefor, still the index "color" should be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to