[
https://issues.apache.org/jira/browse/OAK-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-4471:
---------------------------------
Attachment: node-doc-size2.png
*Node Document Size*
!node-doc-size2.png!
Above histogram is for the size distribution of NodeDocuments excluding
documents under property index which are almost of size ~525
{noformat}
Min. 1st Qu. Median Mean 3rd Qu. Max.
231 788 1354 1223 1581 2855
{noformat}
> More compact storage format for Documents
> -----------------------------------------
>
> Key: OAK-4471
> URL: https://issues.apache.org/jira/browse/OAK-4471
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: documentmk
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Labels: performance
> Fix For: 1.6
>
> Attachments: node-doc-size2.png
>
>
> Aim of this task is to evaluate storage cost of current approach for various
> Documents in DocumentNodeStore. And then evaluate possible alternative to see
> if we can get a significant reduction in storage size.
> Possible areas of improvement
> # NodeDocument
> ## Use binary encoding for property values - Currently property values are
> stored in JSON encoding i.e. arrays and single values are encoded in json
> along with there type
> ## Use binary encoding for Revision values - In a given document Revision
> instances are a major part of storage size. A binary encoding might provide
> more compact storage
> # Journal - The journal entries can be stored in compressed form
> Any new approach should support working with existing setups i.e. provide
> gradual change in storage format.
> *Possible Benefits*
> More compact storage would help in following ways
> # Low memory footprint of Document in Mongo and RDB
> # Low memory footprint for in memory NodeDocument instances - For e.g.
> property values when stored in binary format would consume less memory
> # Reduction in IO over wire - That should reduce the latency in say
> distributed deployments where Oak has to talk to remote primary
> Note that before doing any such change we must analyze the gains. Any change
> in encoding would make interpreting stored data harder and also represents
> significant change in stored data where we need to be careful to not
> introduce any bug!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)