[
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Reschke closed OAK-7105.
-------------------------------
> Implement a traverse with sort strategy for DocumentStoreIndexer
> ----------------------------------------------------------------
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8.0
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which
> it first dumps all nodestates to a json file -> sort them in batches -> merge
> the sorted file. In whole indexing the sorting phase is taking decent amount
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates
> in memory batches where actual size of batch exceeds the estimated size
> considerably. So we need to constant tweak the
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all
> nodestate in a single big json we instead add them to an in memory buffer and
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier
> i.e. monitor the current memory usage and if it goes below certain threshold
> trigger the batch sort
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)