On 20/01/16 10:38, Laurent Rucquoy wrote:
Hi,1 - About the direct mode: Yes, the TDB is running in direct mode, but I have no explanation about why it has been explicitly set in our application source code. 1.1 - What will change in our application if I remove the TDB.getContext().set(SystemTDB.symFileMode, FileMode.direct); line ?
Firstly - I use TDB in Linux, not Windows, so I'm looking to hear of people's experiences. This is based on what I have heard ...
On windows, the difference in performance between direct and mapped modes seems to be much less (near zero?) than Linux.
And you can't delete databases while the JVM using DB is alive. This is a very long standing java issue (see the Java bug tracker - it is in there several times in different reports).
The TDB test cases suffer from this - they use a lot of temporary space as a new DB is made for each test rather than delete-reuse the directory.
1.2 - Is there a default mode which will suit for classical cases ?
The only real way to know the right setting for you is to try it. Your data, the usage patterns and the size may all be factors.
That said, I don't remember any reports to suggest that other than the "delete database" issue, it makes much difference until data sizes go up.
In the version you are running (which is quite old), it is hard to tune the cache sizes. In mapped mode there are no index caches to manages - it flexes automatically (the OS does it - not that TDB has some built-in smarts).
1.3 - Is it possible that this 'forced' direct mode could be the cause of our CPU high-usage issue ?
There is one possibility which is that the GC is under pressure; if you are close to max heap, it may be working hard to keep memory available. There is no specific indication of this one way or the other in your report; it is just a possibility. Solution - increase heap by 25%-50% and see what happens.
2 - About the environment: OS: Windows Server 2008 R2 64bit (Virtual Machine) Java: 1.7.0_67-b01 (64bit)
VMs can be affected by what else the real hardware is hosting. Suppose the hardware is busy - your VM only gets it's allocated %-age of the CPU time, whereas when not busy your VM may be getting a lot more than the "contract" amount. Result - requests take a bit longer and that has a knock-on effect of more results being active at any one time causing more CPU for your VM to be needed. But again, only a possibility.
3 - About the data and the query The changes on the data occur through Jenabean save calls (the underlying object model has not changed.) The query at the point logged in the dump messages is: PREFIX base: <http://www.telemis.com/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xmls: <http://www.w3.org/2001/XMLSchema#> SELECT ?x { ?image base:sopInstanceUID "x.x.xx.x.xxxx.x.x.x.xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"^^xmls:string . ?image a base:Image . ?seq ?p ?image . ?x base:images ?seq . ?x a base:ImageAnnotation ; base:deleted false . }
(I don't know jenabean). There is nothing strange looking about that query.If you added a lot more data, rather than steady incremental growth, it might have
Increase RAM and increase the block caches: System properties: BlockReadCacheSize : default: 10000 so try 250000 BlockWriteCacheSize : default: 2000 so try 5000 NodeId2NodeCacheSize : default 500000 so try 1000000 (1 million) these are all in-heap so increase the heap size.(changes are logged at level info so you can check they have an effect - I am not on my dev machine at the moment so I can't easily check details here I'm afraid)
4 - About the BLOCKED state: Indeed it means that the thread was blocked (not using CPU) at the time of the dump. But looking at the threads list and corresponding CPU usage, these threads were using each about 5% of the CPU, so there is only a 1 in 20 chance that a thread dump will catch them running. Anyway, my colleague managed to get a thread dump while some of the incriminated threads where running. This was possible because he repeated the process a few times and he got a thread that, at the time, was using about 12% of the CPU (so higher chance to catch it while running). Here are two stacktraces (taken from the thread dump) of two threads that were using a lot of CPU and that were caught running:
There are still an occurrence of stacked journals> at com.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76) > - locked <0x0000000729d47240> (a com.hp.hpl.jena.tdb.base.block.BlockMgrCache) > at com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207) > at com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207) > at com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77)
which is unexpected but not evidence of anything definitely wrong.
Andy
4.1 - First stacktrace: "RMI TCP Connection(3879)-172.28.8.58" daemon prio=6 tid=0x0000000072fa2800 nid=0x1884 runnable [0x00000000bf02d000] java.lang.Thread.State: RUNNABLE at com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.find(RecordBuffer.java:181) at com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.find(RecordBuffer.java:133) at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.findSlot(BPTreeNode.java:1145) at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.findHere(BPTreeNode.java:416) at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.recordsPageId(BPTreeNode.java:270) at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:378) at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:366) at com.hp.hpl.jena.tdb.index.TupleIndexRecord.findWorker(TupleIndexRecord.java:164) at com.hp.hpl.jena.tdb.index.TupleIndexRecord.findOrScan(TupleIndexRecord.java:84) at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performFind(TupleIndexRecord.java:78) at com.hp.hpl.jena.tdb.index.TupleIndexBase.find(TupleIndexBase.java:91) at com.hp.hpl.jena.tdb.index.TupleTable.find(TupleTable.java:197) at com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.find(NodeTupleTableConcrete.java:169) at com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:91) at com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:37) at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:49) at com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193) at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46) at com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193) at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) at com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75) at com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138) at com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183) at telemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458) at telemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436) at telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46) at telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26) at telemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50) at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522) at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367) at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 4.2 - Second stacktrace: "RMI TCP Connection(3961)-172.28.4.43" daemon prio=6 tid=0x00000000729db800 nid=0x1ea4 runnable [0x000000007ce6d000] java.lang.Thread.State: RUNNABLE at com.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76) - locked <0x0000000729d47240> (a com.hp.hpl.jena.tdb.base.block.BlockMgrCache) at com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207) at com.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207) at com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77) at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.release(PageBlockMgr.java:92) at com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.close(RecordRangeIterator.java:151) at com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:134) at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) at com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119) at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) at org.apache.jena.atlas.iterator.Iter$3.hasNext(Iter.java:179) at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:906) at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58) at com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193) at org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46) at com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193) at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) at com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40) at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112) at com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75) at com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138) at com.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183) at telemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458) at telemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436) at telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46) at telemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26) at telemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50) at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522) at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367) at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Thank you for your help. Regards, Laurent. On 19 January 2016 at 20:14, Andy Seaborne <[email protected]> wrote:Hi there, It's not clear to be what's going on here - looking at what's changed in the data and what the query that is being issued would be a good starting place. TDB seems to be running in direct mode, which is usually only used for 32 bit JVMs (unless you changed the global environment specially). Is that right? What is the environment? OS? Java version? Is the environment a VM? The stack of BlockMgrJournal.release is odd - I don't think that BlockMgrJournal's get stacked like that but this version of TDB was a while ago or this might be an artifact of the sampling process. One possible oddity : it says java.lang.Thread.State: BLOCKED (on object monitor) so could this thread be waiting and not consuming CPU? Andy On 19/01/16 17:06, Laurent Rucquoy wrote:Yes, it is probable that the queried data has been modified because it's a production server and the TDB is always used. The way the data are updated in the TDB is using the Jenabean library (1.0.6) which is used to persist the corresponding Java object model. On 19 January 2016 at 17:58, A. Soroka <[email protected]> wrote: I don’t have the knowledge to unwrap that trace (the real experts here cando that) but I’d like to ask: if you haven’t changed any part of the executing code, did you change the data over which you’re running the queries at the time the problem appeared, and if so, in what way? --- A. Soroka The University of Virginia Library On Jan 19, 2016, at 11:54 AM, Laurent Rucquoy <[email protected]> wrote:Hello, We have a production server encountering significant slowness resulting from high CPU usage since about two weeks now. When we ckeck high CPU usage threads dumps we note that calls to Jena are always implicated. The calling code in our application executes a SPARQL query and iteratesonquery solution. The Jena version used is 2.10.1 (with jena-tdb 0.10.1) There was recently no change made in our application source code thatcouldexplain this issue. Have you any idea about possible causes ? Thank you in advance for your support. Sincerely, Laurent. Here is a thread dump as an example: "RMI TCP Connection(17383)-10.249.203.163" daemon prio=6 tid=0x0000000015607800 nid=0x1dd0 waiting for monitor entry [0x000000001518d000] java.lang.Thread.State: BLOCKED (on object monitor) atcom.hp.hpl.jena.tdb.base.block.BlockMgrSync.release(BlockMgrSync.java:76)- waiting to lock <0x000000072ab58058> (a com.hp.hpl.jena.tdb.base.block.BlockMgrCache) atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.transaction.BlockMgrJournal.release(BlockMgrJournal.java:207)atcom.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.release(BlockMgrWrapper.java:77)atcom.hp.hpl.jena.tdb.base.page.PageBlockMgr.release(PageBlockMgr.java:92)atcom.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.close(RecordRangeIterator.java:151)atcom.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:134)at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) atcom.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:119)at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) at org.apache.jena.atlas.iterator.Iter$3.hasNext(Iter.java:179) at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:906) atorg.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:58)atcom.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)atorg.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:46)atcom.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:193)at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:293) atcom.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)atcom.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)atcom.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)atcom.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeriesIds(MeasureRetrieveRdf.java:138)atcom.telemis.core.measure.server.rdf.MeasureRetrieveRdf.getMeasuresFromSeries(MeasureRetrieveRdf.java:183)attelemis.measure.tms.service.MeasureService.getMeasuresFromSeries(MeasureService.java:458)attelemis.measure.tms.service.MeasureService.getMeasuresFromExam(MeasureService.java:436)attelemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:46)attelemis.measure.tms.messagehandlers.GetMeasuresFromExamMessageHandler.perform(GetMeasuresFromExamMessageHandler.java:26)attelemis.service.MessageHandlerManager.execute(MessageHandlerManager.java:50)at telemis.service.MomoRMIImpl.executeInternal(MomoRMIImpl.java:522) at telemis.service.MomoRMIImpl.execute(MomoRMIImpl.java:367) at telemis.service.MomoRMIImpl_Skel.dispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(UnknownSource)at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(UnknownSource)at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Locked ownable synchronizers: - <0x0000000737fcdfe8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
