[ 
https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229803#comment-14229803
 ] 

Thomas Mueller commented on OAK-1907:
-------------------------------------

There is now a mechanism to count nodes, in the form of an index editor. The 
index is of type "count". There is a new index (name "counter") of that type. 
The resolution (granularity) is 1000, and if there are many nodes, 10'000. That 
means for every 1000 (or 10'000) added or removed nodes on average, the node 
count (hidden property ":count") is incremented or decremented by the 
resolution. Therefore, the overhead should not be measurable. The resolution is 
configurable using the "resolution" property. The data can be rebuilt if needed 
(for example with a lower resolution), using the "reindex" feature.

Property index editors also use the approximate counter mechanism, to maintain 
the estimated count per index, and per index key (this only for non-unique 
indexes). The resolution is 100 / 1000. 

There is a JMX bean "NodeCounter" that allows to read the hidden data. The 
String parameter is the path (which node contains how many descendent nodes), 
and the int parameter is the depth (to get the node counts of a tree of nodes).

The approximate count is not yet used by the index cost mechanisms; this is the 
next step.

> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.2
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index 
> don't take the number of nodes into account, if there are more than about 100 
> nodes. This is problematic because in many cases, the wrong index is used 
> (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes 
> below a given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, 
> randomly named property (for example called ":count-xyz" where xyz is a uuid, 
> value 100'000) to the parents of that node, so that we know there are 
> probably more than 100'000 nodes below a given path. When removing a node, 
> with the same algorithm add a hidden property (":count-xyz", value -100'000). 
> That should result in a slowdown of less than 0.01%, but should allow us much 
> better cost estimates. Those properties could be consolidated asynchronously 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to