[ 
https://issues.apache.org/jira/browse/OAK-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4471:
---------------------------------
    Attachment: node-doc-size2.png

*Node Document Size*

!node-doc-size2.png!

Above histogram is for the size distribution of NodeDocuments excluding 
documents under property index which are almost of size ~525 

{noformat}
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    231     788    1354    1223    1581    2855 
{noformat}

> More compact storage format for Documents
> -----------------------------------------
>
>                 Key: OAK-4471
>                 URL: https://issues.apache.org/jira/browse/OAK-4471
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>              Labels: performance
>             Fix For: 1.6
>
>         Attachments: node-doc-size2.png
>
>
> Aim of this task is to evaluate storage cost of current approach for various 
> Documents in DocumentNodeStore. And then evaluate possible alternative to see 
> if we can get a significant reduction in storage size.
> Possible areas of improvement
> # NodeDocument
> ## Use binary encoding for property values - Currently property values are 
> stored in JSON encoding i.e. arrays and single values are encoded in json 
> along with there type
> ## Use binary encoding for Revision values - In a given document Revision 
> instances are a major part of storage size. A binary encoding might provide 
> more compact storage
> # Journal - The journal entries can be stored in compressed form
> Any new approach should support working with existing setups i.e. provide 
> gradual change in storage format. 
> *Possible Benefits*
> More compact storage would help in following ways
> # Low memory footprint of Document in Mongo and RDB
> # Low memory footprint for in memory NodeDocument instances - For e.g. 
> property values when stored in binary format would consume less memory
> # Reduction in IO over wire - That should reduce the latency in say 
> distributed deployments where Oak has to talk to remote primary
> Note that before doing any such change we must analyze the gains. Any change 
> in encoding would make interpreting stored data harder and also represents 
> significant change in stored data where we need to be careful to not 
> introduce any bug!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to