[
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072707#comment-16072707
]
Tommaso Teofili commented on OAK-5192:
--------------------------------------
I've tried with the setup suggested by [~chetanm] and I got very different
results with FDS configured with MinRecordLength set to 4000.
||Codec||Repo size||Time taken||
|oakCodec|578.4 MB|8 mins|
|Lucene46|578.0 MB|12 mins|
|customCodec|577.8 MB|17 mins|
So it would seem that from this data, the codec optimization is not worth the
effort.
Perhaps we should also look at the FileDS size, especially for large repos,
where this might be important.
> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene, segment-tar
> Reporter: Michael Dürig
> Assignee: Tommaso Teofili
> Labels: perfomance, scalability
> Fix For: 1.8, 1.7.8
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt,
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen
> Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth.
> While the size of the index itself is well inside reasonable bounds, the
> overall turnover of data being written and removed again can be as much as
> 99%.
> In the case of the TarMK this negatively impacts overall system performance
> due to fast growing number of tar files / segments, bad locality of
> reference, cache misses/thrashing when looking up segments and vastly
> prolonged garbage collection cycles.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)