Thomas Mueller created OAK-4323:
-----------------------------------
Summary: Query engine: index cost formula incorrect when using
"limit"
Key: OAK-4323
URL: https://issues.apache.org/jira/browse/OAK-4323
Project: Jackrabbit Oak
Issue Type: Improvement
Reporter: Thomas Mueller
As described in OAK-2081, the cost formula currently used in the query engine
is not correct if "limit" is used, because it doesn't account for false
positives.
Example: Let's say there are two indexes:
* color: 10000 nodes with color=red, but a bit slower (lets say a remote
index), cost per entry is 1.5.
* size: 20000 nodes with size=M, but a bit faster (lets say a local index),
cost per entry is 1.
Without limit, the index for "color" should be used as 10000 * 1.5 = 15000 is
lower than 20000 * 1 = 20000.
With limit=100, then we could calculate as follows: there are at most 10000
entries (according to index "color"), so the false positive rate of the "size"
index is at least 50%. So cost of "color" is 100 * 1.5 = 150. Cost of "size" is
100 * 1 = 100, but with false positive rate of 50%, so cost is actually 200.
Therefor, still the index "color" should be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)