[jira] [Commented] (OAK-9434) MongoDB indexing: implement parallel chunk download


    [ 
https://issues.apache.org/jira/browse/OAK-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388524#comment-17388524
 ]


Amrit Verma commented on OAK-9434:
----------------------------------

Configurations added -

*Sort strategy type* - 
[https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileNodeStoreBuilder.java#L53].
 |
Example test - 
[https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/test/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreTest.java#L102]

 

*Thread pool size for parallel download* - 
[https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/MultithreadedTraverseWithSortStrategy.java#L326]

 

*Existing data dump dir (to resume from where previous download stopped)* - 
[https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/IndexOptions.java#L106-L108]
 - This option, if specified, should point to the flat file store directory in 
the indexing work dir - See example test case - 
[https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/test/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreTest.java#L175]

 

> MongoDB indexing: implement parallel chunk download
> ---------------------------------------------------
>
>                 Key: OAK-9434
>                 URL: https://issues.apache.org/jira/browse/OAK-9434
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>    Affects Versions: 1.38.0
>            Reporter: Amrit Verma
>            Assignee: Amrit Verma
>            Priority: Major
>
> In case of large indexes, indexing takes a long time. In case of MongoDB 
> Document store, Currently it is a two step process - download the data from 
> mongodb then create index based on that data.
> If something fails during this process, indexing needs to be restarted from 
> beginning of the download step. We should make the indexing process resumable 
> from the point it stopped. 
> Since data download from mongodb seems to be more time taking than indexing 
> itself, we first focus on download part. 
> This Jira issue is for implementing resumable/parallel download.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OAK-9434) MongoDB indexing: implement parallel chunk download

Reply via email to