Joerg Hoh created OAK-9678:
------------------------------
Summary: CacheLIRS should not use synchronized
Key: OAK-9678
URL: https://issues.apache.org/jira/browse/OAK-9678
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core
Affects Versions: 1.40.0
Reporter: Joerg Hoh
I am analyzing a situation, where I get actually hundreds of threads having a
stacktrace like this:
{noformat}
74.118.98.131 [1643218398059] GET
/mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json HTTP/1.1"
#58 prio=5 os_prio=0 cpu=26006.94ms elapsed=1206.98s tid=0x0000560a69765000
nid=0x138d waiting for monitor entry [0x00007f8f1b15b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.access(CacheLIRS.java:910)
- waiting to lock <0x00000006a1c5c1a0> (a
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment)
at
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:893)
at
org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:958)
at org.apache.jackrabbit.oak.cache.CacheLIRS.get(CacheLIRS.java:299)
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.getNode(DocumentNodeStore.java:1271)
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1449)
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1445)
{noformat}
Checking the code at [1] the most basic Java synchronization mechanism (the
{{synchronized}} keyword) is used. According to this DZone article [2] this can
be problematic, as with every thread leaving such a synchronized block all
threads waiting for this lock ware woken up but only 1 thread might enter this
section; the others are sent back to sleep. It recommends to use a
ReentrantReadWriteLock instead, which is much smarter and just wakes up 1
thread.
In my situation I had a huge CPU usage during that situation, which I am not
able to explain because the threaddumps did show that there was hardly any
other thread working there, but the vast majority were blocked like above.
While I think, that such an improvement might now have fully avoided the
problem I face I think that such an optimization is still useful. This is a
heavily used code-path and if there's a way to reduce the overhead of locking
itself it would highly useful.
[1]
https://github.com/apache/jackrabbit-oak/blob/08eab301c869c227d8721da0e9b9bd3d2029d458/oak-core-spi/src/main/java/org/apache/jackrabbit/oak/cache/CacheLIRS.java#L909
[2] https://dzone.com/articles/synchronized-considered
--
This message was sent by Atlassian Jira
(v8.20.1#820001)