[
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297924#comment-16297924
]
Chetan Mehrotra commented on OAK-6353:
--------------------------------------
Some performance numbers for reindexing done for repo having 255M Mongo Docs,
66M nodes under /content and having 4.2M assets
# Normal NodeStore traversal - 13.66 h
*Document Traversal*
A - Default setup
# Total time - 3.469 h
## Time in dumping - 2.405 h
## Time in sorting - 39.87 min
### Batch sorting - 19.13 min
### Merging - 20.17
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 43.6 GB
#* index size - 2.5 GB
{noformat}
2017-12-15 16:48:34 Proceeding to index [/oak:index/damAssetLucene2] upto
checkpoint head {}
2017-12-15 19:12:55 Dumped 65472172 nodestates in json format in 2.405 h
2017-12-15 19:12:55 Compression enabled while sorting : false
(oak.indexer.useZip)
2017-12-15 19:12:55 Delete original dump from traversal : true
(oak.indexer.deleteOriginal)
2017-12-15 19:12:55 Max heap memory (GB) to be used for merge sort : 3
(oak.indexer.maxSortMemoryInGB)
2017-12-15 19:12:57 Sorting with memory 3.2 GB (estimated 12.6 GB)
2017-12-15 19:32:05 Batch sorting done in 19.13 min with 29 files of size 43.6
GB to merge
2017-12-15 19:32:05 Removing the original file temp/flat-file-store/store.json
2017-12-15 19:52:50 Merging of sorted files completed in 20.71 min
2017-12-15 19:52:50 Sorting completed in 39.87 min
2017-12-15 19:52:50 Estimated node count to be traversed for reindexing under /
is [65472172]
2017-12-15 20:16:35 Indexing report
- /oak:index/damAssetLucene2*(4407265)
2017-12-15 20:16:43 Indexing completed for indexes [/oak:index/damAssetLucene2]
in 3.469 h (12488171 ms)
{noformat}
B - Compression enabled in sorting
# Total time - 3.811 h
## Time in dumping - 2.929 h
## Time in sorting - 29.56 min
### Batch sorting - 17.67 min
### Merging - 11.87 min
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 5.5 GB
#* index size - 2.5 GB
{noformat}
2017-12-19 10:56:00 Proceeding to index [/oak:index/damAssetLucene2] upto
checkpoint head {}
2017-12-19 13:51:50 oreBuilder - Dumped 65469575 nodestates in json format in
2.929 h (43.6 GB)
2017-12-19 13:51:50 oreBuilder - Compression enabled while sorting : true
(oak.indexer.useZip)
2017-12-19 13:51:50 oreBuilder - Delete original dump from traversal : true
(oak.indexer.deleteOriginal)
2017-12-19 13:51:50 oreBuilder - Max heap memory (GB) to be used for merge sort
: 3 (oak.indexer.maxSortMemoryInGB)
2017-12-19 13:51:52 Sorter - Sorting with memory 3.2 GB (estimated 12.6 GB)
2017-12-19 14:09:32 Sorter - Batch sorting done in 17.67 min with 29 files of
size 5.5 GB to merge
2017-12-19 14:09:32 Sorter - Removing the original file
temp/flat-file-store/store.json
2017-12-19 14:21:25 Sorter - Merging of sorted files completed in 11.87 min
2017-12-19 14:21:25 Sorter - Sorting completed in 29.56 min
2017-12-19 14:21:26 Estimated node count to be traversed for reindexing under /
is [65469575]
2017-12-19 14:44:30 Indexing report
- /oak:index/damAssetLucene2*(4407265)
2017-12-19 14:44:30 Reindexing completed
2017-12-19 14:44:30 Switched the async lane for indexes at
[/oak:index/damAssetLucene2] back to there original lanes
2017-12-19 14:44:39 Indexing completed for indexes [/oak:index/damAssetLucene2]
in 3.811 h (13718589 ms)
{noformat}
> Use Document order traversal for reindexing performed on DocumentNodeStore
> setups
> ---------------------------------------------------------------------------------
>
> Key: OAK-6353
> URL: https://issues.apache.org/jira/browse/OAK-6353
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.7.13, 1.8
>
> Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
> that document order traversal can be faster compared to current mode of path
> based traversal. Initial test indicate that such a traversal can be order of
> magnitude faster.
> So this task is meant to implement such an approach and see if it can be a
> viable indexing mode used for DocumentNodeStore based setups
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)