[ 
https://issues.apache.org/jira/browse/OAK-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577546#comment-13577546
 ] 

Marcel Reutegger commented on OAK-591:
--------------------------------------

I might be missing something, but to me it looks like the id lookup is a dead 
end or requires major changes to the MicroKernel API. Let me explain and recap 
where this issue originates from...

The primary addressing scheme into the multi-version tree is by revision (of 
the root node) and the path of the node we want to read. Later a secondary 
scheme was introduced with :hash and :id values instead of the revision+path 
combo. Though, it was only introduced in the getNodes() method and none of the 
other methods that take a path parameter.

The cache in oak-core uses the primary addressing scheme for the keys and 
therefore faces immediate expiration of the complete cache when there is a 
change. See also initial description of this issue. The idea then was to 
leverage the :id or :hash properties, but as it turns out this doesn't play 
well with the current MicroKernel API.

As mentioned we will have to tighten the contract of getNodes() to always 
return :id or :hash properties for a sub tree once an implementation provides 
either of these properties for a node. However, this is not sufficient, we also 
have {{KernelNodeState.getChildNode(String name)}}. Given a node state, which 
was retrieved with a :hash or :id and the child node with the requested name is 
not yet in the cache (or does not exist), the node will have to construct an 
identifier like {{[:hash|:id]name}}. Basically what [Jukka|#comment-13576645] 
proposed earlier.

Alternatively oak-core could maintain an additional lookup into the cache based 
on :hash or :id. To leverage this lookup, oak-core needs to evaluate a possible 
:hash or :id property *after* it read a node state and try to find out if there 
already exists a nodes state in the cache with the same :hash or :id. Instead 
of using the node state just read from the MicroKernel, it can then return the 
one from the cache with the same :hash or :id.

This is quite ugly IMO, because even then we'd read nodes, which didn't change. 
Ideally this kind of logic should be located closer where the data is read. 
This brings us back to the MicroKernel API, where access to nodes is primarily 
defined by revison of the root node and path of the node to access. In this 
context the higher level NodeStore API provides a nicer abstraction IMO. The 
exact way how the tree is versioned is completely hidden, as well as how 
children are linked to the parent and vice versa is open and up to the 
implementation. Of course the MicroKernel API provides some choices as well, 
but only to a certain degree.
                
> Improve KernelNodeStore cache efficiency
> ----------------------------------------
>
>                 Key: OAK-591
>                 URL: https://issues.apache.org/jira/browse/OAK-591
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.6
>            Reporter: Marcel Reutegger
>         Attachments: mk.log.gz, OAK-591.patch
>
>
> The cache in KernelNodeStore references entries with a path+revision combo. 
> This mapping quickly becomes inefficient when there are writes on the 
> repository. Whenever something is changed, the complete cache basically 
> becomes invalid and oak-core needs to re-fetch nodes again, even though they 
> didn't change. The attached test shows this behaviour. The test initially 
> creates 10 nodes and lets a thread read those nodes repeatedly. To make the 
> test somewhat realistic the reader acquires a new session in every run 
> through the loop. This is to simulate e.g. a request which acquires a new 
> session every time (Apache Sling does it that way). At the same time writes 
> occur but in a separate part of the repository. As can be seen in the logs, 
> the nodes are read from the MicroKernel whenever something changes anywhere 
> in the repository. Obviously this is no limited to the test nodes. The log 
> also shows repeated reads to node type, user and index nodes. None of them 
> change while the test runs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to