[jira] [Created] (OAK-7819) Improve logging for indexing progress

JIRA Thu, 11 Oct 2018 01:28:48 -0700

Jörg Hoh created OAK-7819:
-----------------------------

             Summary: Improve logging for indexing progress
                 Key: OAK-7819
                 URL: https://issues.apache.org/jira/browse/OAK-7819
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
    Affects Versions: 1.8.2
            Reporter: Jörg Hoh



At the moment I am trying to understand how I can improve the indexing 
performance of my RDB-based Oak setup.

Currently the indexing progress is logged like this:
{noformat}
10.10.2018 13:00:04.077 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing will be 
performed for following indexes: [/oak:index/nodetype]
10.10.2018 13:00:15.911 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #10000 
<path> [666,60 nodes/s, 2399760,00 nodes/hr]
10.10.2018 13:00:21.792 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #20000 
<path> [999,95 nodes/s, 3599820,00 nodes/hr]
10.10.2018 13:00:27.211 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #30000 
<path> [1153,81 nodes/s, 4153707,69 nodes/hr]
10.10.2018 13:00:31.581 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #40000 
<path> [1333,30 nodes/s, 4799880,00 nodes/hr]
...
10.10.2018 13:13:44.585 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed 
#580000 <path> [704,74 nodes/s, 2537055,16 nodes/hr]
10.10.2018 13:14:04.738 *INFO* [Apache Sling Repository Startup Thread] 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed 
#590000 <path> [699,88 nodes/s, 2519568,68 nodes/hr]
...
{noformat}

But it isn't clear to me how much of the time is spent on 
* fetching the nodes to be indexed from the repo (in our case residing in the 
RDB)
* the actual indexing computation
* the time to store extracted index data

having a more detailed logging of these individual aspects could shed some more 
light on the bottlenecks of this process.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (OAK-7819) Improve logging for indexing progress

Reply via email to