[ 
https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231580#comment-14231580
 ] 

Thomas Mueller commented on OAK-1907:
-------------------------------------

For property indexes, usage of the approximate count estimation in a property 
index can be disabled by setting the entryCount. If negative, the old mechanism 
is used (simply counting the nodes). If another value is used (0 or higher), 
then the entry count is used as before. So, to use the approximation, the 
entryCount property needs to be removed. Existing index data will not have the 
correct estimation. To update the statistics, the index could be re-built. 
Otherwise, either the old mechanism is used (until there around 100 entries are 
added), or a relatively low count is used (100 for example), but this is not 
different than without the approximation, so I don't expect that this will 
result in problems.

For the traversal index, the estimation resolution is relatively low (1000 / 
10000). The estimated count is at least 1000, so in most cases a property index 
will be used if available (except for property indexes that have many entries). 
When traversing only the direct child nodes, the estimated count divided by 10 
is used.

> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.2
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index 
> don't take the number of nodes into account, if there are more than about 100 
> nodes. This is problematic because in many cases, the wrong index is used 
> (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes 
> below a given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, 
> randomly named property (for example called ":count-xyz" where xyz is a uuid, 
> value 100'000) to the parents of that node, so that we know there are 
> probably more than 100'000 nodes below a given path. When removing a node, 
> with the same algorithm add a hidden property (":count-xyz", value -100'000). 
> That should result in a slowdown of less than 0.01%, but should allow us much 
> better cost estimates. Those properties could be consolidated asynchronously 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to