[ 
https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044666#comment-14044666
 ] 

Davide Giannella commented on OAK-1907:
---------------------------------------

Based on my comment above I amended slightly the provided POC by
passing a new Random instance at every insert/delete and got the same
result as before so it seems there's not really any difference and my
concerns above are not valid.

Execution with class-scope Random: real: 3330840 approx: 6144000.

Execution with new instance every run: real: 3329182 approx: 6144000.

Amended version for the records:

{noformat}
$ diff ApproxCount.java ApproxCount_001.java 
19,21c19,20
< //    Random random = new Random(1);
<     Random rnd2 = new Random(1);
<     
---
>     Random random = new Random(1);
> 
39c38
<             if (count == 0 || rnd2.nextInt(3) > 0) {
---
>             if (count == 0 || random.nextInt(3) > 0) {
92c91
<         if (new Random().nextInt(x) == 0) {
---
>         if (random.nextInt(x) == 0) {
104c103
<         if (new Random().nextInt(x) == 0) {
---
>         if (random.nextInt(x) == 0) {
{noformat}


> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.1
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index 
> don't take the number of nodes into account, if there are more than about 100 
> nodes. This is problematic because in many cases, the wrong index is used 
> (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes 
> below a given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, 
> randomly named property (for example called ":count-xyz" where xyz is a uuid, 
> value 100'000) to the parents of that node, so that we know there are 
> probably more than 100'000 nodes below a given path. When removing a node, 
> with the same algorithm add a hidden property (":count-xyz", value -100'000). 
> That should result in a slowdown of less than 0.01%, but should allow us much 
> better cost estimates. Those properties could be consolidated asynchronously 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to