[ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299800#comment-16299800
 ] 

Chetan Mehrotra commented on OAK-7105:
--------------------------------------

Implemented the above flow with 1818896

> Implement a traverse with sort strategy for DocumentStoreIndexer
> ----------------------------------------------------------------
>
>                 Key: OAK-7105
>                 URL: https://issues.apache.org/jira/browse/OAK-7105
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to