[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297924#comment-16297924 ] Chetan Mehrotra commented on OAK-6353: -- Some performance numbers for reindexing done for repo having 255M Mongo Docs, 66M nodes under /content and having 4.2M assets # Normal NodeStore traversal - 13.66 h *Document Traversal* A - Default setup # Total time - 3.469 h ## Time in dumping - 2.405 h ## Time in sorting - 39.87 min ### Batch sorting - 19.13 min ### Merging - 20.17 ## Indexing 24 mins # Space consumed #* dumped json - 43.6 GB #* chunked files - 43.6 GB #* index size - 2.5 GB {noformat} 2017-12-15 16:48:34 Proceeding to index [/oak:index/damAssetLucene2] upto checkpoint head {} 2017-12-15 19:12:55 Dumped 65472172 nodestates in json format in 2.405 h 2017-12-15 19:12:55 Compression enabled while sorting : false (oak.indexer.useZip) 2017-12-15 19:12:55 Delete original dump from traversal : true (oak.indexer.deleteOriginal) 2017-12-15 19:12:55 Max heap memory (GB) to be used for merge sort : 3 (oak.indexer.maxSortMemoryInGB) 2017-12-15 19:12:57 Sorting with memory 3.2 GB (estimated 12.6 GB) 2017-12-15 19:32:05 Batch sorting done in 19.13 min with 29 files of size 43.6 GB to merge 2017-12-15 19:32:05 Removing the original file temp/flat-file-store/store.json 2017-12-15 19:52:50 Merging of sorted files completed in 20.71 min 2017-12-15 19:52:50 Sorting completed in 39.87 min 2017-12-15 19:52:50 Estimated node count to be traversed for reindexing under / is [65472172] 2017-12-15 20:16:35 Indexing report - /oak:index/damAssetLucene2*(4407265) 2017-12-15 20:16:43 Indexing completed for indexes [/oak:index/damAssetLucene2] in 3.469 h (12488171 ms) {noformat} B - Compression enabled in sorting # Total time - 3.811 h ## Time in dumping - 2.929 h ## Time in sorting - 29.56 min ### Batch sorting - 17.67 min ### Merging - 11.87 min ## Indexing 24 mins # Space consumed #* dumped json - 43.6 GB #* chunked files - 5.5 GB #* index size - 2.5 GB {noformat} 2017-12-19 10:56:00 Proceeding to index [/oak:index/damAssetLucene2] upto checkpoint head {} 2017-12-19 13:51:50 oreBuilder - Dumped 65469575 nodestates in json format in 2.929 h (43.6 GB) 2017-12-19 13:51:50 oreBuilder - Compression enabled while sorting : true (oak.indexer.useZip) 2017-12-19 13:51:50 oreBuilder - Delete original dump from traversal : true (oak.indexer.deleteOriginal) 2017-12-19 13:51:50 oreBuilder - Max heap memory (GB) to be used for merge sort : 3 (oak.indexer.maxSortMemoryInGB) 2017-12-19 13:51:52 Sorter - Sorting with memory 3.2 GB (estimated 12.6 GB) 2017-12-19 14:09:32 Sorter - Batch sorting done in 17.67 min with 29 files of size 5.5 GB to merge 2017-12-19 14:09:32 Sorter - Removing the original file temp/flat-file-store/store.json 2017-12-19 14:21:25 Sorter - Merging of sorted files completed in 11.87 min 2017-12-19 14:21:25 Sorter - Sorting completed in 29.56 min 2017-12-19 14:21:26 Estimated node count to be traversed for reindexing under / is [65469575] 2017-12-19 14:44:30 Indexing report - /oak:index/damAssetLucene2*(4407265) 2017-12-19 14:44:30 Reindexing completed 2017-12-19 14:44:30 Switched the async lane for indexes at [/oak:index/damAssetLucene2] back to there original lanes 2017-12-19 14:44:39 Indexing completed for indexes [/oak:index/damAssetLucene2] in 3.811 h (13718589 ms) {noformat} > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.7.13, 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294747#comment-16294747 ] Chetan Mehrotra commented on OAK-6353: -- There are some aspects which still need to be taken care of. See OAK-7074 > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.7.13, 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251128#comment-16251128 ] Julian Reschke commented on OAK-6353: - Fixed nullability annotation in [r1815190|http://svn.apache.org/r1815190] > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164170#comment-16164170 ] Chetan Mehrotra commented on OAK-6353: -- Based on some tests done by [~chibulcu] for 100M nodes * Simple DBCursor traversal - ~50 mins , 82k docs/sec * NodeStore traversal ** From a remote setup - 1.2d ** From a local setup (on same machine as Mongo) - 8 hrs, 8k docs/sec So to perform full reindexing on such setups we would need to make better use of Document order traversal > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156484#comment-16156484 ] Chetan Mehrotra commented on OAK-6353: -- Mongodb provides a {{\$natual}} option [1] which forces the cursor to use the natual order. This can then be coupled with a regex which can filter out hidden nodes alltogeher [1] https://docs.mongodb.com/manual/reference/operator/meta/natural/#metaOp._S_natural > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6353) Use Document order traversal for reindexing performed on DocumentNodeStore setups
[ https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154791#comment-16154791 ] Chetan Mehrotra commented on OAK-6353: -- Applied the patch with r1807438 > Use Document order traversal for reindexing performed on DocumentNodeStore > setups > - > > Key: OAK-6353 > URL: https://issues.apache.org/jira/browse/OAK-6353 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch > > > [~tmueller] suggested > [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442] > that document order traversal can be faster compared to current mode of path > based traversal. Initial test indicate that such a traversal can be order of > magnitude faster. > So this task is meant to implement such an approach and see if it can be a > viable indexing mode used for DocumentNodeStore based setups -- This message was sent by Atlassian JIRA (v6.4.14#64029)