Joerg Hoh created OAK-9678:
------------------------------

             Summary: CacheLIRS should not use synchronized
                 Key: OAK-9678
                 URL: https://issues.apache.org/jira/browse/OAK-9678
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.40.0
            Reporter: Joerg Hoh


I am analyzing a situation, where I get actually hundreds of threads having a 
stacktrace like this:

{noformat}
74.118.98.131 [1643218398059] GET 
/mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json HTTP/1.1" 
#58 prio=5 os_prio=0 cpu=26006.94ms elapsed=1206.98s tid=0x0000560a69765000 
nid=0x138d waiting for monitor entry  [0x00007f8f1b15b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.access(CacheLIRS.java:910)
        - waiting to lock <0x00000006a1c5c1a0> (a 
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment)
        at 
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:893)
        at 
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:958)
        at org.apache.jackrabbit.oak.cache.CacheLIRS.get(CacheLIRS.java:299)
        at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.getNode(DocumentNodeStore.java:1271)
        at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1449)
        at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1445)
{noformat}

Checking the code at [1] the most basic Java synchronization mechanism (the 
{{synchronized}} keyword) is used. According to this DZone article [2] this can 
be problematic, as with every thread leaving such a synchronized block all 
threads waiting for this lock ware woken up but only 1 thread might enter this 
section; the others are sent back to sleep. It recommends to use a 
ReentrantReadWriteLock instead, which is much smarter and just wakes up 1 
thread.

In my situation I had a huge CPU usage during that situation, which I am not 
able to explain because the threaddumps did show that there was hardly any 
other thread working there, but the vast majority were blocked like above.
While I think, that such an improvement might now have fully avoided the 
problem I face I think that such an optimization is still useful. This is a 
heavily used code-path and if there's a way to reduce the overhead of locking 
itself it would highly useful.



[1] 
https://github.com/apache/jackrabbit-oak/blob/08eab301c869c227d8721da0e9b9bd3d2029d458/oak-core-spi/src/main/java/org/apache/jackrabbit/oak/cache/CacheLIRS.java#L909

[2] https://dzone.com/articles/synchronized-considered



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to