[
https://issues.apache.org/jira/browse/OAK-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335736#comment-15335736
]
Thomas Mueller commented on OAK-3536:
-------------------------------------
I think we should either try to create a reproducible test case (with
real-world data), or close this issue as "can't reproduce". What do you think?
> Indexing with Lucene and copy-on-read generate too much garbage in the
> BlobStore
> --------------------------------------------------------------------------------
>
> Key: OAK-3536
> URL: https://issues.apache.org/jira/browse/OAK-3536
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Affects Versions: 1.3.9
> Reporter: Francesco Mari
> Priority: Critical
> Fix For: 1.6
>
>
> The copy-on-read strategy when using Lucene indexing performs too many copies
> of the index files from the filesystem to the repository. Every copy discards
> the previously stored binary, that sits there as garbage until the binary
> garbage collection kicks in. When the load on the system is particularly
> intense, this behaviour makes the repository grow at an unreasonable high
> pace.
> I spotted this on a system where some content is generated every day at a
> specific time. The content generation process creates approx. 6 millions new
> nodes, where each node contains 5 properties with small string, random
> values. Nodes were saved in batches of 1000 nodes each. At the end of the
> content generation process, the nodes are deleted to deliberately generate
> garbage in the Segment Store. This is part of a testing effort to assess the
> efficiency of the online compaction.
> I was never able to complete the tests because the system run out of disk
> space due to a lot of unused binary values. When debugging the system, on a
> 400 GB (full) disk, the segments containing nodes and property values
> occupied approx. 3 GB. The rest of the space was occupied by binary values in
> form of bulk segments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)