Jörg Hoh created OAK-7819: ----------------------------- Summary: Improve logging for indexing progress Key: OAK-7819 URL: https://issues.apache.org/jira/browse/OAK-7819 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Affects Versions: 1.8.2 Reporter: Jörg Hoh
At the moment I am trying to understand how I can improve the indexing performance of my RDB-based Oak setup. Currently the indexing progress is logged like this: {noformat} 10.10.2018 13:00:04.077 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing will be performed for following indexes: [/oak:index/nodetype] 10.10.2018 13:00:15.911 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #10000 <path> [666,60 nodes/s, 2399760,00 nodes/hr] 10.10.2018 13:00:21.792 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #20000 <path> [999,95 nodes/s, 3599820,00 nodes/hr] 10.10.2018 13:00:27.211 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #30000 <path> [1153,81 nodes/s, 4153707,69 nodes/hr] 10.10.2018 13:00:31.581 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #40000 <path> [1333,30 nodes/s, 4799880,00 nodes/hr] ... 10.10.2018 13:13:44.585 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #580000 <path> [704,74 nodes/s, 2537055,16 nodes/hr] 10.10.2018 13:14:04.738 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #590000 <path> [699,88 nodes/s, 2519568,68 nodes/hr] ... {noformat} But it isn't clear to me how much of the time is spent on * fetching the nodes to be indexed from the repo (in our case residing in the RDB) * the actual indexing computation * the time to store extracted index data having a more detailed logging of these individual aspects could shed some more light on the bottlenecks of this process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)