[
https://issues.apache.org/jira/browse/OAK-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Mari updated OAK-3536:
--------------------------------
Fix Version/s: 1.3.9
> Indexing with Lucene and copy-on-read generate too much garbage in the
> BlobStore
> --------------------------------------------------------------------------------
>
> Key: OAK-3536
> URL: https://issues.apache.org/jira/browse/OAK-3536
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Affects Versions: 1.3.9
> Reporter: Francesco Mari
> Priority: Critical
> Fix For: 1.4
>
>
> The copy-on-read strategy when using Lucene indexing performs too many copies
> of the index files from the filesystem to the repository. Every copy discards
> the previously stored binary, that sits there as garbage until the binary
> garbage collection kicks in. When the load on the system is particularly
> intense, this behaviour makes the repository grow at an unreasonable high
> pace.
> I spotted this on a system where some content is generated every day at a
> specific time. The content generation process creates approx. 6 millions new
> nodes, where each node contains 5 properties with small string, random
> values. Nodes were saved in batches of 1000 nodes each. At the end of the
> content generation process, the nodes are deleted to deliberately generate
> garbage in the Segment Store. This is part of a testing effort to assess the
> efficiency of the online compaction.
> I was never able to complete the tests because the system run out of disk
> space due to a lot of unused binary values. When debugging the system, on a
> 400 GB (full) disk, the segments containing nodes and property values
> occupied approx. 3 GB. The rest of the space was occupied by binary values in
> form of bulk segments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)