[
https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044666#comment-14044666
]
Davide Giannella commented on OAK-1907:
---------------------------------------
Based on my comment above I amended slightly the provided POC by
passing a new Random instance at every insert/delete and got the same
result as before so it seems there's not really any difference and my
concerns above are not valid.
Execution with class-scope Random: real: 3330840 approx: 6144000.
Execution with new instance every run: real: 3329182 approx: 6144000.
Amended version for the records:
{noformat}
$ diff ApproxCount.java ApproxCount_001.java
19,21c19,20
< // Random random = new Random(1);
< Random rnd2 = new Random(1);
<
---
> Random random = new Random(1);
>
39c38
< if (count == 0 || rnd2.nextInt(3) > 0) {
---
> if (count == 0 || random.nextInt(3) > 0) {
92c91
< if (new Random().nextInt(x) == 0) {
---
> if (random.nextInt(x) == 0) {
104c103
< if (new Random().nextInt(x) == 0) {
---
> if (random.nextInt(x) == 0) {
{noformat}
> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
> Key: OAK-1907
> URL: https://issues.apache.org/jira/browse/OAK-1907
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Affects Versions: 1.0, 1.0.1, 1.0.2
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Fix For: 1.1
>
> Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index
> don't take the number of nodes into account, if there are more than about 100
> nodes. This is problematic because in many cases, the wrong index is used
> (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes
> below a given path is needed.
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden,
> randomly named property (for example called ":count-xyz" where xyz is a uuid,
> value 100'000) to the parents of that node, so that we know there are
> probably more than 100'000 nodes below a given path. When removing a node,
> with the same algorithm add a hidden property (":count-xyz", value -100'000).
> That should result in a slowdown of less than 0.01%, but should allow us much
> better cost estimates. Those properties could be consolidated asynchronously
> if needed.
--
This message was sent by Atlassian JIRA
(v6.2#6252)