[ 
https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492599#comment-15492599
 ] 

Alex Parvulescu commented on OAK-4631:
--------------------------------------

I think an important aspect of the impact of this patch was not fully tested, 
namely _disk space utilization_. I'm running some upgrade tests using the 
latest trunk now and I have some interesting results to share (I'm using 
{{oak-run debug}} to collect data):

 - source 8.7GB, oak 1.0
{noformat}
Total size:
7 GB in  54137 data segments
768 KB in      3 bulk segments
1 GB in maps (20650196 leaf and branch records)
113 MB in lists (3714097 list and bucket records)
3 GB in values (value and block records of 73489693 properties, 
3432/378779/0/1214488 small/medium/long/external blobs, 51059734/3318006/159 
small/medium/long strings)
120 MB in templates (16786491 template records)
1 GB in nodes (221232040 node records)
{noformat}

 - upgraded instance pre OAK-4631 (based on rev 1757389) 11GB
{noformat}
Total size:
10 GB in 321341 data segments
768 KB in      3 bulk segments
2 GB in maps (46451304 leaf and branch records)
619 MB in lists (55468842 list and bucket records)
3 GB in values (value and block records of 70764647 properties, 
3429/378684/0/1214419 small/medium/long/external blobs, 46258634/1862224/159 
small/medium/long strings)
113 MB in templates (16772763 template records)
1 GB in nodes (251592041 node records)
{noformat}

 - upgraded instance post OAK-4631 37GB
{noformat}
Total size:
36 GB in 150205 data segments
768 KB in      3 bulk segments
6 GB in maps (35228936 leaf and branch records)
3 GB in lists (55508867 list and bucket records)
4 GB in values (value and block records of 75853352 properties, 
3742/380719/0/1216770 small/medium/long/external blobs, 76087785/4765208/159 
small/medium/long strings)
712 MB in templates (33716018 template records)
13 GB in nodes (390207210 node records)
{noformat}

The size delta is pretty big, upgraded repo jumps from {{11GB}} to {{37GB}}.


> Simplify the format of segments and serialized records
> ------------------------------------------------------
>
>                 Key: OAK-4631
>                 URL: https://issues.apache.org/jira/browse/OAK-4631
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: Segment Tar 0.0.10
>
>         Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, 
> OAK-4631-04.patch
>
>
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it 
> might be beneficial to simplify both the format of the segments and the way 
> record IDs are serialised. A new strategy needs to be investigated to reach 
> the right compromise between performance, disk space utilization and 
> simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to