[
https://issues.apache.org/jira/browse/OAK-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577546#comment-13577546
]
Marcel Reutegger commented on OAK-591:
--------------------------------------
I might be missing something, but to me it looks like the id lookup is a dead
end or requires major changes to the MicroKernel API. Let me explain and recap
where this issue originates from...
The primary addressing scheme into the multi-version tree is by revision (of
the root node) and the path of the node we want to read. Later a secondary
scheme was introduced with :hash and :id values instead of the revision+path
combo. Though, it was only introduced in the getNodes() method and none of the
other methods that take a path parameter.
The cache in oak-core uses the primary addressing scheme for the keys and
therefore faces immediate expiration of the complete cache when there is a
change. See also initial description of this issue. The idea then was to
leverage the :id or :hash properties, but as it turns out this doesn't play
well with the current MicroKernel API.
As mentioned we will have to tighten the contract of getNodes() to always
return :id or :hash properties for a sub tree once an implementation provides
either of these properties for a node. However, this is not sufficient, we also
have {{KernelNodeState.getChildNode(String name)}}. Given a node state, which
was retrieved with a :hash or :id and the child node with the requested name is
not yet in the cache (or does not exist), the node will have to construct an
identifier like {{[:hash|:id]name}}. Basically what [Jukka|#comment-13576645]
proposed earlier.
Alternatively oak-core could maintain an additional lookup into the cache based
on :hash or :id. To leverage this lookup, oak-core needs to evaluate a possible
:hash or :id property *after* it read a node state and try to find out if there
already exists a nodes state in the cache with the same :hash or :id. Instead
of using the node state just read from the MicroKernel, it can then return the
one from the cache with the same :hash or :id.
This is quite ugly IMO, because even then we'd read nodes, which didn't change.
Ideally this kind of logic should be located closer where the data is read.
This brings us back to the MicroKernel API, where access to nodes is primarily
defined by revison of the root node and path of the node to access. In this
context the higher level NodeStore API provides a nicer abstraction IMO. The
exact way how the tree is versioned is completely hidden, as well as how
children are linked to the parent and vice versa is open and up to the
implementation. Of course the MicroKernel API provides some choices as well,
but only to a certain degree.
> Improve KernelNodeStore cache efficiency
> ----------------------------------------
>
> Key: OAK-591
> URL: https://issues.apache.org/jira/browse/OAK-591
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.6
> Reporter: Marcel Reutegger
> Attachments: mk.log.gz, OAK-591.patch
>
>
> The cache in KernelNodeStore references entries with a path+revision combo.
> This mapping quickly becomes inefficient when there are writes on the
> repository. Whenever something is changed, the complete cache basically
> becomes invalid and oak-core needs to re-fetch nodes again, even though they
> didn't change. The attached test shows this behaviour. The test initially
> creates 10 nodes and lets a thread read those nodes repeatedly. To make the
> test somewhat realistic the reader acquires a new session in every run
> through the loop. This is to simulate e.g. a request which acquires a new
> session every time (Apache Sling does it that way). At the same time writes
> occur but in a separate part of the repository. As can be seen in the logs,
> the nodes are read from the MicroKernel whenever something changes anywhere
> in the repository. Obviously this is no limited to the test nodes. The log
> also shows repeated reads to node type, user and index nodes. None of them
> change while the test runs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira