[ 
https://issues.apache.org/jira/browse/OAK-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344106#comment-15344106
 ] 

Chetan Mehrotra commented on OAK-4471:
--------------------------------------

bq. Compressing will obfuscate readability which will make it harder for 
deployers to fix issues directly. 

Ack

bq. Any benefits in compressing individual documents might be minimal. 

Yes for storage cost. However we would still benefit with size of stuff that 
goes over the wire. As mentioned I am collecting more stats on size and then we 
would need to see how much benefit is there in compact format. If the gains are 
not very high then definitely we should avoid this. One of the driver for me 
was Dynamo limit of 4k as 1 read unit. 

bq. Minimising ratios between Application Domain Object -> JCR Node -> Oak 
Document will have the largest impact.

Ack and here we should see much improvement with 
* OAK-1312 - Bundle multiple nodes in single Document
* OAK-4412 - Hybrid lucene index which should allow us to remove quite a few 
property indexes which IMHO constitute the major portion of number of Documents 
we create in Mongo


> More compact storage format for Documents
> -----------------------------------------
>
>                 Key: OAK-4471
>                 URL: https://issues.apache.org/jira/browse/OAK-4471
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>              Labels: performance
>             Fix For: 1.6
>
>         Attachments: node-doc-size2.png
>
>
> Aim of this task is to evaluate storage cost of current approach for various 
> Documents in DocumentNodeStore. And then evaluate possible alternative to see 
> if we can get a significant reduction in storage size.
> Possible areas of improvement
> # NodeDocument
> ## Use binary encoding for property values - Currently property values are 
> stored in JSON encoding i.e. arrays and single values are encoded in json 
> along with there type
> ## Use binary encoding for Revision values - In a given document Revision 
> instances are a major part of storage size. A binary encoding might provide 
> more compact storage
> # Journal - The journal entries can be stored in compressed form
> Any new approach should support working with existing setups i.e. provide 
> gradual change in storage format. 
> *Possible Benefits*
> More compact storage would help in following ways
> # Low memory footprint of Document in Mongo and RDB
> # Low memory footprint for in memory NodeDocument instances - For e.g. 
> property values when stored in binary format would consume less memory
> # Reduction in IO over wire - That should reduce the latency in say 
> distributed deployments where Oak has to talk to remote primary
> Note that before doing any such change we must analyze the gains. Any change 
> in encoding would make interpreting stored data harder and also represents 
> significant change in stored data where we need to be careful to not 
> introduce any bug!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to