[ 
https://issues.apache.org/jira/browse/OAK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-1907:
----------------------------------

    Attachment: OAK-1907.diff

Attaching a patch file with a proposed adaptation of the algorithm
(OAK-1907.diff).

It implements a utility with static methods to be invoked on
NodeBuilder/NodeState. The unit test included should show how to use
it and what to expect.

I have some doubts though. It's based on pseudo-random approach where
we expect a "marker" every x calls of the method and a Random has to
be provided. However to have a proper distribution the user should
keep a live random instance for each node that is tracking otherwise
if a new Random is instantiated at every call I doubt we could have a
proper distribution.

Another approach could be to have a "business" oriented random having
for example each index implementation to keep a live instance of
Random but that will generate only meaningful number for the index
root and won't be valuable for more in-depth analysis like range
queries where we're not returning the whole index but only a subset of
it.

However if the class is fine I will commit to trunk for being used.



> Better cost estimates for traversal, property, and ordered indexes
> ------------------------------------------------------------------
>
>                 Key: OAK-1907
>                 URL: https://issues.apache.org/jira/browse/OAK-1907
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.0, 1.0.1, 1.0.2
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.1
>
>         Attachments: ApproxCount.java, OAK-1907.diff
>
>
> Currently, cost estimates of traversal, property index, and ordered index 
> don't take the number of nodes into account, if there are more than about 100 
> nodes. This is problematic because in many cases, the wrong index is used 
> (because of incorrect cost estimate).
> To get a better estimate, a very rough estimate on the number of child nodes 
> below a given path is needed. 
> One idea is: when adding a node, if Math.random() < 0.00001, add a hidden, 
> randomly named property (for example called ":count-xyz" where xyz is a uuid, 
> value 100'000) to the parents of that node, so that we know there are 
> probably more than 100'000 nodes below a given path. When removing a node, 
> with the same algorithm add a hidden property (":count-xyz", value -100'000). 
> That should result in a slowdown of less than 0.01%, but should allow us much 
> better cost estimates. Those properties could be consolidated asynchronously 
> if needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to