[jira] [Comment Edited] (OAK-1312) Bundle nodes into a document

Chetan Mehrotra (JIRA) Tue, 14 Jun 2016 22:31:22 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329334#comment-15329334
 ]


Chetan Mehrotra edited comment on OAK-1312 at 6/15/16 5:30 AM:
---------------------------------------------------------------

Had a discussion on this with [~mreutegg] today. Some food for thought

* Make use of NodeType and mixin (as suggested above) to determine which nodes 
can be collapsed
* Make use of approach we took in oak-lucene where support for relative 
property was implemented by adding relative property path as field name in 
parent Lucene Document. So if for app:Asset we need to index 
jcr:content/metadata/format then for the Lucene document created for app:Asset 
node we create a field with name jcr:content/metadata/format. 

So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
        + jcr:content
          - jcr:data = ...
{noformat}


# *Write flow* - UpdateOp would create property name based on relative path of 
the property being collapsed. For above case that would be 
jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a 
new node is detected which by its mixin is eligible for bundled storage then 
diff would read all child node and bundle the properties in the host node. This 
would result in a single NodeDocument with id /content/image1/original with 
given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary 
path should not happen. So when original DocumentNodeState is read and code ask 
for specific child node or all children then it can create further NodeState 
based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses 
and attempts to create UpdateOp it would have to detect if the property is part 
of bundle node tree or not
# *Observation Flow* - TDB

*Bundling Approach* 

To start with we can implement approach where for any marked node (marked for 
bundle ) complete subtree under that is to be bundled. This would make 
implementation simpler

Later we can also try an approach similar to one taken for index time 
aggregation. Application can provide the node paths which needs to be bundled 
via config and then while creating we make use of that to determine the host 
node in which all such stuff needs to be bundled.


was (Author: chetanm):
Had a discussion on this with [~mreutegg] today. Some food for thought

* Make use of NodeType and mixin (as suggested above) to determine which nodes 
can be collapsed
* Make use of approach we took in oak-lucene where support for relative 
property was implemented by adding relative property path as field name in 
parent Lucene Document. So if for app:Asset we need to index 
jcr:content/metadata/format then for the Lucene document created for app:Asset 
node we create a field with name jcr:content/metadata/format. 

So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
        + jcr:content
          - jcr:data = ...
{noformat}


# *Write flow* - UpdateOp would create property name based on relative path of 
the property being collapsed. For above case that would be 
jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a 
new node is detected which by its mixin is eligible for bundled storage then 
diff would read all child node and bundle the properties in the host node. This 
would result in a single NodeDocument with id /content/image1/original with 
given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary 
path should not happen. So when original DocumentNodeState is read and code ask 
for specific child node or all children then it can create further NodeState 
based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses 
and attempts to create UpdateOp it would have to detect if the property is part 
of bundle node tree or not

*Bundling Approach* 

To start with we can implement approach where for any marked node (marked for 
bundle ) complete subtree under that is to be bundled. This would make 
implementation simpler

Later we can also try an approach similar to one taken for index time 
aggregation. Application can provide the node paths which needs to be bundled 
via config and then while creating we make use of that to determine the host 
node in which all such stuff needs to be bundled.

> Bundle nodes into a document
> ----------------------------
>
>                 Key: OAK-1312
>                 URL: https://issues.apache.org/jira/browse/OAK-1312
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, documentmk
>            Reporter: Marcel Reutegger
>            Assignee: Chetan Mehrotra
>              Labels: performance
>             Fix For: 1.6
>
>
> For very fine grained content with many nodes and only few properties per 
> node it would be more efficient to bundle multiple nodes into a single 
> MongoDB document. Mostly reading would benefit because there are less 
> roundtrips to the backend. At the same time storage footprint would be lower 
> because metadata overhead is per document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (OAK-1312) Bundle nodes into a document

Reply via email to