[ 
https://issues.apache.org/jira/browse/OAK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931613#comment-13931613
 ] 

Alex Parvulescu edited comment on OAK-1465 at 3/12/14 10:26 AM:
----------------------------------------------------------------

This is how the property index updates look like [0] each save operation 
triggers 2 index updates (_before_ and _after_ are the index keys):
 -  one node type (oak:Unstructured)
 - one property type. (here the property has the same value as the node name)

My profiling session shows a lot of cache misses on the 
DocumentNodeStore#getNode, and given the high frequency of small commits I 
don't see any code tweaks that I could do to speed up this test.

It would be interesting to add some sort of output of the cache stats after the 
tests, I wanted to at least see them, but I found it ridiculously hard to get a 
reference to that stats object.

I'm un-assigning myself from this issue, but I'm still open to any ideas of 
improvement, so feel free to point to anything I might have missed in the 
indexing code.

A small thing I've noticed is that the NodeBuilder#getChildNodeNames in the 
case of the DocumentNodeState is using the default AbstractNodeState impl which 
is simply calling #getChildNodeEntries and then extracting the names. I did not 
see heavy usage of this method (I ran into it in the 
IndexUpdate#collectIndexEditors method), so I don't think very important to 
provide a more efficient implementation.


[0]
{code}
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [217f0ea5-190c-4c56-8b7d-c4b180c670a1]
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [oak%3AUnstructured]
{code}


was (Author: alex.parvulescu):
This is how the property index updates look like [0]: each save operation 
triggers 2 index updates, one node type and one property type. 
My profiling session shows a lot of cache misses on the 
DocumentNodeStore#getNode, and given the high frequency of small commits I 
don't see any code tweaks that I could do to speed up this test.

It would be interesting to add some sort of output of the cache stats after the 
tests, I wanted to at least see them, but I found it ridiculously hard to get a 
reference to that stats object.

I'm un-assigning myself from this issue, but I'm still open to any ideas of 
improvement, so feel free to point to anything I might have missed in the 
indexing code.

A small thing I've noticed is that the NodeBuilder#getChildNodeNames in the 
case of the DocumentNodeState is using the default AbstractNodeState impl which 
is simply calling #getChildNodeEntries and then extracting the names. I did not 
see heavy usage of this method (I ran into it in the 
IndexUpdate#collectIndexEditors method), so I don't think very important to 
provide a more efficient implementation.


[0]
{code}
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [217f0ea5-190c-4c56-8b7d-c4b180c670a1]
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [oak%3AUnstructured]
{code}

> performance degradation with growing index size on Oak-Mongo
> ------------------------------------------------------------
>
>                 Key: OAK-1465
>                 URL: https://issues.apache.org/jira/browse/OAK-1465
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 0.17.1
>            Reporter: Stefan Egli
>            Assignee: Alex Parvulescu
>            Priority: Blocker
>             Fix For: 0.19
>
>         Attachments: CreateManyIndexedNodesTest.java
>
>
> Tested with an oak-snapshot of Monday Feb 24, 10AM EST.
> Noticed that when the amount of nodes indexed - eg wrt a particular property 
> - the adding of nodes becomes slower and slower.
> Will attach a oak-run benchmark to underline this. Basically the scenario 
> where this occurred was:
>  * have a number of "level 1" nodes (eg 100)
>  * under those "level 1" nodes, add a growing list of children, each with a 
> property that is indexed (ie that index is actually growing and is probably 
> causing the slowdown).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to