Re: TDB suddenly seems to use more CPU

Andy Seaborne Thu, 21 Jan 2016 02:54:01 -0800

On 20/01/16 10:38, Laurent Rucquoy wrote:

Hi,


1 - About the direct mode:
Yes, the TDB is running in direct mode, but I have no explanation about why
it has been explicitly set in our application source code.
1.1 - What will change in our application if I remove the
TDB.getContext().set(SystemTDB.symFileMode, FileMode.direct);     line ?

Firstly - I use TDB in Linux, not Windows, so I'm looking to hear ofpeople's experiences. This is based on what I have heard ...

On windows, the difference in performance between direct and mappedmodes seems to be much less (near zero?) than Linux.

And you can't delete databases while the JVM using DB is alive. This isa very long standing java issue (see the Java bug tracker - it is inthere several times in different reports).

The TDB test cases suffer from this - they use a lot of temporary spaceas a new DB is made for each test rather than delete-reuse the directory.

1.2 - Is there a default mode which will suit for classical cases ?

The only real way to know the right setting for you is to try it. Yourdata, the usage patterns and the size may all be factors.

That said, I don't remember any reports to suggest that other than the"delete database" issue, it makes much difference until data sizes go up.

In the version you are running (which is quite old), it is hard to tunethe cache sizes. In mapped mode there are no index caches to manages -it flexes automatically (the OS does it - not that TDB has some built-insmarts).

1.3 - Is it possible that this 'forced' direct mode could be the cause of
our CPU high-usage issue ?

There is one possibility which is that the GC is under pressure; if youare close to max heap, it may be working hard to keep memory available.There is no specific indication of this one way or the other in yourreport; it is just a possibility. Solution - increase heap by 25%-50%and see what happens.



2 - About the environment:
OS: Windows Server 2008 R2 64bit (Virtual Machine)
Java: 1.7.0_67-b01 (64bit)

VMs can be affected by what else the real hardware is hosting. Supposethe hardware is busy - your VM only gets it's allocated %-age of the CPUtime, whereas when not busy your VM may be getting a lot more than the"contract" amount. Result - requests take a bit longer and that has aknock-on effect of more results being active at any one time causingmore CPU for your VM to be needed. But again, only a possibility.



3 - About the data and the query
The changes on the data occur through Jenabean save calls (the underlying
object model has not changed.)
The query at the point logged in the dump messages is:

PREFIX base: <http://www.telemis.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xmls: <http://www.w3.org/2001/XMLSchema#>
SELECT ?x
{
?image base:sopInstanceUID
"x.x.xx.x.xxxx.x.x.x.xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"^^xmls:string .
?image a base:Image .
?seq ?p ?image .
?x base:images ?seq .
?x a base:ImageAnnotation ;
base:deleted false .
}


(I don't know jenabean).

There is nothing strange looking about that query.

If you added a lot more data, rather than steady incremental growth, itmight have


Increase RAM and increase the block caches:
System properties:

BlockReadCacheSize : default: 10000 so try 250000
BlockWriteCacheSize : default: 2000 so try 5000
NodeId2NodeCacheSize : default 500000 so try 1000000 (1 million)

these are all in-heap so increase the heap size.

(changes are logged at level info so you can check they have an effect -I am not on my dev machine at the moment so I can't easily check detailshere I'm afraid)

4 - About the BLOCKED state:
Indeed it means that the thread was blocked (not using CPU) at the time of
the dump.
But looking at the threads list and corresponding CPU usage, these threads
were using each about 5% of the CPU, so there is only a 1 in 20 chance that
a thread dump will catch them running.
Anyway, my colleague managed to get a thread dump while some of the
incriminated threads where running.
This was possible because he repeated the process a few times and he got a
thread that, at the time, was using about 12% of the CPU (so higher chance
to catch it while running).

Here are two stacktraces (taken from the thread dump) of two threads that
were using a lot of CPU and that were caught running:


There are still an occurrence of stacked journals

> atcom.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76)> - locked <0x0000000729d47240> (acom.hp.hpl.jena.tdb.base.block.BlockMgrCache)> atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)> atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)> atcom.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77)


which is unexpected but not evidence of anything definitely wrong.

        Andy


4.1 - First stacktrace:

"RMI TCP Connection(3879)-172.28.8.58" daemon prio=6
tid=0x0000000072fa2800 nid=0x1884 runnable [0x00000000bf02d000]
    java.lang.Thread.State: RUNNABLE
at com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.find(RecordBuffer.java:181)
at com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.find(RecordBuffer.java:133)
at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.findSlot(BPTreeNode.java:1145)
at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.findHere(BPTreeNode.java:416)
at 
com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.recordsPageId(BPTreeNode.java:270)
at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:378)
at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:366)
at 
com.hp.hpl.jena.tdb.index.TupleIndexRecord.findWorker(TupleIndexRecord.java:164)
at 
com.hp.hpl.jena.tdb.index.TupleIndexRecord.findOrScan(TupleIndexRecord.java:84)
at 
com.hp.hpl.jena.tdb.index.TupleIndexRecord.performFind(TupleIndexRecord.java:78)
at com.hp.hpl.jena.tdb.index.TupleIndexBase.find(TupleIndexBase.java:91)
at com.hp.hpl.jena.tdb.index.TupleTable.find(TupleTable.java:197)
at 
com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.find(NodeTupleTableConcrete.java:169)
at 
com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:91)
at 
com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:37)
at 
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:49)
at 
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)
at 
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46)
at 
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)
at 
com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138)
at 
com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183)
at 
telemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458)
at 
telemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436)
at 
telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46)
at 
telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26)
at telemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50)
at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522)
at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367)
at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)



4.2 - Second stacktrace:

"RMI TCP Connection(3961)-172.28.4.43" daemon prio=6
tid=0x00000000729db800 nid=0x1ea4 runnable [0x000000007ce6d000]
    java.lang.Thread.State: RUNNABLE
at com.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76)
- locked <0x0000000729d47240> (a com.hp.hpl.jena.tdb.base.block.BlockMgrCache)
at 
com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)
at 
com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)
at 
com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77)
at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.release(PageBlockMgr.java:92)
at 
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.close(RecordRangeIterator.java:151)
at 
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:134)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at 
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at org.apache.jena.atlas.iterator.Iter$3.hasNext(Iter.java:179)
at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:906)
at 
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)
at 
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)
at 
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46)
at 
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at 
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)
at 
com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138)
at 
com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183)
at 
telemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458)
at 
telemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436)
at 
telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46)
at 
telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26)
at telemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50)
at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522)
at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367)
at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)



Thank you for your help.
Regards,
Laurent.



On 19 January 2016 at 20:14, Andy Seaborne <[email protected]> wrote:

Hi there,

It's not clear to be what's going on here - looking at what's changed in
the data and what the query that is being issued would be a good starting
place.

TDB seems to be running in direct mode, which is usually only used for 32
bit JVMs (unless you changed the global environment specially).

Is that right?
What is the environment? OS? Java version?
Is the environment a VM?

The stack of BlockMgrJournal.release is odd - I don't think that
BlockMgrJournal's get stacked like that but this version of TDB was a while
ago or this might be an artifact of the sampling process.

One possible oddity : it says
java.lang.Thread.State: BLOCKED (on object monitor)

so could this thread be waiting and not consuming CPU?

         Andy


On 19/01/16 17:06, Laurent Rucquoy wrote:

Yes, it is probable that the queried data has been modified because it's a
production server and the TDB is always used.
The way the data are updated in the TDB is using the Jenabean library
(1.0.6) which is used to persist the corresponding Java object model.

On 19 January 2016 at 17:58, A. Soroka <[email protected]> wrote:

I don’t have the knowledge to unwrap that trace (the real experts here can

do that) but I’d like to ask: if you haven’t changed any part of the
executing code, did you change the data over which you’re running the
queries at the time the problem appeared, and if so, in what way?

---
A. Soroka
The University of Virginia Library

On Jan 19, 2016, at 11:54 AM, Laurent Rucquoy <

[email protected]> wrote:


Hello,

We have a production server encountering significant slowness resulting
from high CPU usage since about two weeks now.

When we ckeck high CPU usage threads dumps we note that calls to Jena
are
always implicated.

The calling code in our application executes a SPARQL query and iterates

on

query solution.
The Jena version used is 2.10.1 (with jena-tdb 0.10.1)
There was recently no change made in our application source code that

could

explain this issue.

Have you any idea about possible causes ?

Thank you in advance for your support.

Sincerely,
Laurent.


Here is a thread dump as an example:

"RMI TCP Connection(17383)-10.249.203.163" daemon prio=6
tid=0x0000000015607800 nid=0x1dd0 waiting for monitor entry
[0x000000001518d000]
    java.lang.Thread.State: BLOCKED (on object monitor)
at

com.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76)

- waiting to lock <0x000000072ab58058> (a
com.hp.hpl.jena.tdb.base.block.BlockMgrCache)
at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)

at


com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77)

at

com.hp.hpl.jena.tdb.base.page.PageBlockMgr.release(PageBlockMgr.java:92)

at


com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.close(RecordRangeIterator.java:151)

at


com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:134)

at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at


com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)

at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at org.apache.jena.atlas.iterator.Iter$3.hasNext(Iter.java:179)
at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:906)
at


org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)

at


com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)

at


org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46)

at


com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)

at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293)
at


com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)

at


com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)

at


com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)

at


com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138)

at


com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183)

at


telemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458)

at


telemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436)

at


telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46)

at


telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26)

at


telemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50)

at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522)
at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367)
at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown

Source)

at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown

Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

    Locked ownable synchronizers:
- <0x0000000737fcdfe8> (a
java.util.concurrent.ThreadPoolExecutor$Worker)

Re: TDB suddenly seems to use more CPU

Reply via email to