[ 
https://issues.apache.org/jira/browse/OAK-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297924#comment-16297924
 ] 

Chetan Mehrotra commented on OAK-6353:
--------------------------------------

Some performance numbers for reindexing done for repo having 255M Mongo Docs, 
66M nodes under /content and having 4.2M assets

# Normal NodeStore traversal - 13.66 h

*Document Traversal*

A - Default setup 

# Total time - 3.469 h
## Time in dumping - 2.405 h
## Time in sorting - 39.87 min
###  Batch sorting - 19.13 min
###  Merging - 20.17
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 43.6 GB
#* index size - 2.5 GB

{noformat}
2017-12-15 16:48:34 Proceeding to index [/oak:index/damAssetLucene2] upto 
checkpoint head {} 
2017-12-15 19:12:55 Dumped 65472172 nodestates in json format in 2.405 h 
2017-12-15 19:12:55 Compression enabled while sorting : false 
(oak.indexer.useZip) 
2017-12-15 19:12:55 Delete original dump from traversal : true 
(oak.indexer.deleteOriginal) 
2017-12-15 19:12:55 Max heap memory (GB) to be used for merge sort : 3 
(oak.indexer.maxSortMemoryInGB) 
2017-12-15 19:12:57 Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-15 19:32:05 Batch sorting done in 19.13 min with 29 files of size 43.6 
GB to merge 
2017-12-15 19:32:05 Removing the original file temp/flat-file-store/store.json 
2017-12-15 19:52:50 Merging of sorted files completed in 20.71 min 
2017-12-15 19:52:50 Sorting completed in 39.87 min 
2017-12-15 19:52:50 Estimated node count to be traversed for reindexing under / 
is [65472172] 
2017-12-15 20:16:35 Indexing report
    - /oak:index/damAssetLucene2*(4407265)
2017-12-15 20:16:43 Indexing completed for indexes [/oak:index/damAssetLucene2] 
in 3.469 h (12488171 ms) 
{noformat}

B - Compression enabled in sorting

# Total time - 3.811 h
## Time in dumping - 2.929 h
## Time in sorting - 29.56 min
###  Batch sorting - 17.67 min
###  Merging - 11.87 min
## Indexing 24 mins
# Space consumed
#* dumped json - 43.6 GB
#* chunked files - 5.5 GB
#* index size - 2.5 GB

{noformat}
2017-12-19 10:56:00  Proceeding to index [/oak:index/damAssetLucene2] upto 
checkpoint head {} 
2017-12-19 13:51:50 oreBuilder - Dumped 65469575 nodestates in json format in 
2.929 h (43.6 GB) 
2017-12-19 13:51:50 oreBuilder - Compression enabled while sorting : true 
(oak.indexer.useZip) 
2017-12-19 13:51:50 oreBuilder - Delete original dump from traversal : true 
(oak.indexer.deleteOriginal) 
2017-12-19 13:51:50 oreBuilder - Max heap memory (GB) to be used for merge sort 
: 3 (oak.indexer.maxSortMemoryInGB) 
2017-12-19 13:51:52 Sorter - Sorting with memory 3.2 GB (estimated 12.6 GB) 
2017-12-19 14:09:32 Sorter - Batch sorting done in 17.67 min with 29 files of 
size 5.5 GB to merge 
2017-12-19 14:09:32 Sorter - Removing the original file 
temp/flat-file-store/store.json 
2017-12-19 14:21:25 Sorter - Merging of sorted files completed in 11.87 min 
2017-12-19 14:21:25 Sorter - Sorting completed in 29.56 min 
2017-12-19 14:21:26 Estimated node count to be traversed for reindexing under / 
is [65469575] 
2017-12-19 14:44:30 Indexing report
    - /oak:index/damAssetLucene2*(4407265)
 2017-12-19 14:44:30 Reindexing completed 
2017-12-19 14:44:30 Switched the async lane for indexes at 
[/oak:index/damAssetLucene2] back to there original lanes 
2017-12-19 14:44:39 Indexing completed for indexes [/oak:index/damAssetLucene2] 
in 3.811 h (13718589 ms)
{noformat}

> Use Document order traversal for reindexing performed on DocumentNodeStore 
> setups
> ---------------------------------------------------------------------------------
>
>                 Key: OAK-6353
>                 URL: https://issues.apache.org/jira/browse/OAK-6353
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.7.13, 1.8
>
>         Attachments: OAK-6353-v1.patch, OAK-6353-v2.patch
>
>
> [~tmueller] suggested 
> [here|https://issues.apache.org/jira/browse/OAK-6246?focusedCommentId=16034442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16034442]
>  that document order traversal can be faster compared to current mode of path 
> based traversal. Initial test indicate that such a traversal can be order of 
> magnitude faster. 
> So this task is meant to implement such an approach and see if it can be a 
> viable indexing mode used for DocumentNodeStore based setups



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to