[ 
https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492715#comment-15492715
 ] 

Francesco Mari commented on OAK-4631:
-------------------------------------

It's also interesting to take the average of the values above, because it helps 
putting these information in perspective.

- source, Oak 1.0
{noformat}
135  KB   per data segment
52   byte per map
0.28 byte per list
7    byte per template
5    byte per node
{noformat}
- upgraded instance, pre OAK-4631
{noformat}
33 KB   per data segment
46 byte per map
12 byte per list
7  byte per template
4  byte per node
{noformat}
- upgraded instance, post OAK-4631
{noformat}
251 KB   per data segment
182 byte per map
58  byte per list
22  byte per template
35  byte per node
{noformat}

Records got bigger, that's undeniable. But as a consequence of this change 
records are more easily parseable, segments are better utilised and 54% less 
segments are needed to store the same data. Less segments means a smaller size 
of book-keeping data structures used throughout the Segment Store, especially 
when it comes to compaction. This change traded space for simplicity, and I 
think there is some value in that.

> Simplify the format of segments and serialized records
> ------------------------------------------------------
>
>                 Key: OAK-4631
>                 URL: https://issues.apache.org/jira/browse/OAK-4631
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: Segment Tar 0.0.10
>
>         Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, 
> OAK-4631-04.patch
>
>
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it 
> might be beneficial to simplify both the format of the segments and the way 
> record IDs are serialised. A new strategy needs to be investigated to reach 
> the right compromise between performance, disk space utilization and 
> simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to