[jira] [Commented] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-21 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299858#comment-16299858
 ] 

Chetan Mehrotra commented on OAK-7105:
--

Switched the default with 1818900

> Implement a traverse with sort strategy for DocumentStoreIndexer
> 
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7105) Implement a traverse with sort strategy for DocumentStoreIndexer

2017-12-21 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299800#comment-16299800
 ] 

Chetan Mehrotra commented on OAK-7105:
--

Implemented the above flow with 1818896

> Implement a traverse with sort strategy for DocumentStoreIndexer
> 
>
> Key: OAK-7105
> URL: https://issues.apache.org/jira/browse/OAK-7105
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.15
>
>
> Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which 
> it first dumps all nodestates to a json file -> sort them in batches -> merge 
> the sorted file. In whole indexing the sorting phase is taking decent amount 
> of time (40 mins out of 3 hr run).
> Further this approach suffers with potential OOM while ExternalSort creates 
> in memory batches where actual size of batch exceeds the estimated size 
> considerably. So we need to constant tweak the 
> "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB)
> As an improvement we can do following changes
> # Implement a traverse with sort strategy - Here instead of first dumping all 
> nodestate in a single big json we instead add them to an in memory buffer and 
> then at some stage sort the batch and save it to file
> # Use better memory checks - Use the approach as implemented in GCBarrier 
> i.e. monitor the current memory usage and if it goes below certain threshold 
> trigger the batch sort



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)