[
https://issues.apache.org/jira/browse/OAK-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329334#comment-15329334
]
Chetan Mehrotra edited comment on OAK-1312 at 6/15/16 5:30 AM:
---------------------------------------------------------------
Had a discussion on this with [~mreutegg] today. Some food for thought
* Make use of NodeType and mixin (as suggested above) to determine which nodes
can be collapsed
* Make use of approach we took in oak-lucene where support for relative
property was implemented by adding relative property path as field name in
parent Lucene Document. So if for app:Asset we need to index
jcr:content/metadata/format then for the Lucene document created for app:Asset
node we create a field with name jcr:content/metadata/format.
So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
+ jcr:content
- jcr:data = ...
{noformat}
# *Write flow* - UpdateOp would create property name based on relative path of
the property being collapsed. For above case that would be
jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a
new node is detected which by its mixin is eligible for bundled storage then
diff would read all child node and bundle the properties in the host node. This
would result in a single NodeDocument with id /content/image1/original with
given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary
path should not happen. So when original DocumentNodeState is read and code ask
for specific child node or all children then it can create further NodeState
based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses
and attempts to create UpdateOp it would have to detect if the property is part
of bundle node tree or not
# *Observation Flow* - TDB
*Bundling Approach*
To start with we can implement approach where for any marked node (marked for
bundle ) complete subtree under that is to be bundled. This would make
implementation simpler
Later we can also try an approach similar to one taken for index time
aggregation. Application can provide the node paths which needs to be bundled
via config and then while creating we make use of that to determine the host
node in which all such stuff needs to be bundled.
was (Author: chetanm):
Had a discussion on this with [~mreutegg] today. Some food for thought
* Make use of NodeType and mixin (as suggested above) to determine which nodes
can be collapsed
* Make use of approach we took in oak-lucene where support for relative
property was implemented by adding relative property path as field name in
parent Lucene Document. So if for app:Asset we need to index
jcr:content/metadata/format then for the Lucene document created for app:Asset
node we create a field with name jcr:content/metadata/format.
So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
+ jcr:content
- jcr:data = ...
{noformat}
# *Write flow* - UpdateOp would create property name based on relative path of
the property being collapsed. For above case that would be
jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a
new node is detected which by its mixin is eligible for bundled storage then
diff would read all child node and bundle the properties in the host node. This
would result in a single NodeDocument with id /content/image1/original with
given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary
path should not happen. So when original DocumentNodeState is read and code ask
for specific child node or all children then it can create further NodeState
based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses
and attempts to create UpdateOp it would have to detect if the property is part
of bundle node tree or not
*Bundling Approach*
To start with we can implement approach where for any marked node (marked for
bundle ) complete subtree under that is to be bundled. This would make
implementation simpler
Later we can also try an approach similar to one taken for index time
aggregation. Application can provide the node paths which needs to be bundled
via config and then while creating we make use of that to determine the host
node in which all such stuff needs to be bundled.
> Bundle nodes into a document
> ----------------------------
>
> Key: OAK-1312
> URL: https://issues.apache.org/jira/browse/OAK-1312
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, documentmk
> Reporter: Marcel Reutegger
> Assignee: Chetan Mehrotra
> Labels: performance
> Fix For: 1.6
>
>
> For very fine grained content with many nodes and only few properties per
> node it would be more efficient to bundle multiple nodes into a single
> MongoDB document. Mostly reading would benefit because there are less
> roundtrips to the backend. At the same time storage footprint would be lower
> because metadata overhead is per document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)