[jira] [Commented] (JENA-1769) Dataset#listNames slow for large TDB2 datasets

2019-10-21 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955917#comment-16955917
 ] 

Damien Obrist commented on JENA-1769:
-

[~andy] I see, thanks for the explanation and for fixing the issue!

> Dataset#listNames slow for large TDB2 datasets
> --
>
> Key: JENA-1769
> URL: https://issues.apache.org/jira/browse/JENA-1769
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.13.0
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
>  Labels: performance
> Fix For: Jena 3.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased 
> significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing 
> *1'000'000 triples*. I have observed a running time of *~270ms* with Jena 
> 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the 
> running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-1769) Dataset#listNames slow for large TDB2 datasets

2019-10-17 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953861#comment-16953861
 ] 

Damien Obrist commented on JENA-1769:
-

[~andy] thanks for the information and for looking into this!

I tested using a sample dataset that I had created for a previous issue, 
JENA-1619. The dataset is contained in the attachment in that issue 
({{jena-transaction-exception-master.zip}}), inside the {{sample-data}} folder. 
It consists of 1'000'000 dummy quads of the form
{noformat}
  
 
{noformat}

> Dataset#listNames slow for large TDB2 datasets
> --
>
> Key: JENA-1769
> URL: https://issues.apache.org/jira/browse/JENA-1769
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.13.0
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
>  Labels: performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased 
> significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing 
> *1'000'000 triples*. I have observed a running time of *~270ms* with Jena 
> 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the 
> running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-1769) Dataset#listNames slow for large TDB2 datasets

2019-10-16 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952753#comment-16952753
 ] 

Damien Obrist commented on JENA-1769:
-

h2. Analysis

Using git bisect I was able to identify JENA-1695 as the issue that introduced 
this increase.

I have also stumbled upon this 
[comment|https://github.com/apache/jena/pull/598#discussion_r319529442] in the 
pull request for JENA-1748, from which I gather that {{listNames}} now takes so 
long because it iterates over all quads in the dataset.

[~andy] I wasn't sure whether to file this issue as a bug or an improvement, 
given that iterating over all quads (and the implied running time) seems to be 
the expected behavior. I'm pretty much unfamiliar with the internals of TDB, 
but I'm wondering if in the refactored code there was a way to achieve a 
similar performance than in Jena 3.12.0.

> Dataset#listNames slow for large TDB2 datasets
> --
>
> Key: JENA-1769
> URL: https://issues.apache.org/jira/browse/JENA-1769
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.13.0
>Reporter: Damien Obrist
>Priority: Major
>  Labels: performance
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased 
> significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing 
> *1'000'000 triples*. I have observed a running time of *~270ms* with Jena 
> 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the 
> running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (JENA-1769) Dataset#listNames slow for large TDB2 datasets

2019-10-16 Thread Damien Obrist (Jira)
Damien Obrist created JENA-1769:
---

 Summary: Dataset#listNames slow for large TDB2 datasets
 Key: JENA-1769
 URL: https://issues.apache.org/jira/browse/JENA-1769
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB2
Affects Versions: Jena 3.13.0
Reporter: Damien Obrist


With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased 
significantly for TDB2 datasets.

I have compared the running times for a sample TDB2 dataset containing 
*1'000'000 triples*. I have observed a running time of *~270ms* with Jena 
3.12.0 and *~13.5s* with Jena 3.13.0.

We're using a dataset with many millions of triples and for our use case, the 
running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-16 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930286#comment-16930286
 ] 

Damien Obrist commented on JENA-1758:
-

[~andy.seaborne] you're welcome, thank you for reviewing and merging this issue 
so quickly.
{quote}I hope you haven't had to make a custom loader simply for that.
{quote}
No, thankfully not. All I did was extend {{LoaderMain}} to, in addition to 
loading from files, be able to load from a stream. I just realized that my code 
snippet is a bit misleading and actually looks more like this:
{code:java}
CustomLoader loader = new CustomLoader(LoaderPlans.loaderPlanParallel, 
dataset.asDatasetGraph);
loader.load(inputStream);
{code}
And the loading code as described is actually inside the custom loader.
{quote}FYI: The variant of the loader that uses as much parallelism as possible
{quote}
Yes, I have noticed. However, in my case that's perfect as I want the loading 
to be as fast as possible and use all available resources. I haven't had any 
problems with TCP connections so far, but thanks for the heads up!

I see the fix version is Jena 3.13.0. Do you have a rough estimate on when it 
is going to be released?

> Exception when preparing InfModel after TDB loading operation
> -
>
> Key: JENA-1758
> URL: https://issues.apache.org/jira/browse/JENA-1758
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.12.0
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Critical
> Fix For: Jena 3.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h2. Exception
> I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
> loader (extending {{LoaderMain}}), since the triples being loaded are 
> generated by a separate application and streamed in over HTTP.
> After the loading is done, I try to reset an {{InfModel}} to recompute the 
> inference taking into account the new triples, but I encounter the following 
> exception:
> {noformat}
> org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
> 32834204)32834205
>   at org.apache.jena.atlas.io.IO.exception(IO.java:254)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.checkRead(TransBinaryDataFile.java:190)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:184)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:318)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:311)
>   at org.apache.jena.riot.thrift.wire.RDF_IRI.read(RDF_IRI.java:258)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:319)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:112)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:108)
>   at 
> org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:53)
>   at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>   at 
> org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:36)
>   at 
> org.apache.jena.tdb2.store.IteratorTxnTracker.next(IteratorTxnTracker.java:43)
>   at 
> java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.lamb

[jira] [Comment Edited] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928617#comment-16928617
 ] 

Damien Obrist edited comment on JENA-1758 at 9/12/19 3:32 PM:
--

h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when adding sleep for a few seconds after the loading and before preparing 
the model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : already 
closed
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.checkNotClosed(BlockMgrFileAccess.java:78)
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocLimit(BlockMgrFileAccess.java:150)
at 
org.apache.jena.dboe.base.block.BlockMgrWrapper.allocLimit(BlockMgrWrapper.java:87)
at 
org.apache.jena.dboe.base.page.PageBlockMgr.allocLimit(PageBlockMgr.java:45)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:544)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:73)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.commitPrepare(TransactionalComponentLifecycle.java:98)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$executePrepare$12(TransactionCoordinator.java:687)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executePrepare(TransactionCoordinator.java:685)
at 
org.apache.jena.dboe.transaction.txn.Transaction.prepare(Transaction.java:151)
at 
org.apache.jena.dboe.transaction.txn.Transaction.commit(Transaction.java:160)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:139)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-67" java.lang.NullPointerException
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.getDataState(TransactionalComponentLifecycle.java:251)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.abort(TransactionalComponentLifecycle.java:126)
at org.apache.jena.dboe.transaction.txn.SysTrans.abort(SysTrans.java:45)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$null$10(Transaction.java:189)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$abort$$11(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executeAbort(TransactionCoordinator.java:726)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort$(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort(Transaction.java:181)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:142)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This shows that {{DataToTuples}} is the culprit still running after the loading 
operation has returned.


was (Author: dobrist):
h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when sleep for a few seconds after the loading and before preparing the 
model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : alrea

[jira] [Commented] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928625#comment-16928625
 ] 

Damien Obrist commented on JENA-1758:
-

h2. Solution

The thread started in {{DataToTuples#startBulk}} needs to be waited for before 
finishing the loading operation. Else it might still be running after the 
loading has returned, which causes the observed problems.

Pull request:
https://github.com/apache/jena/pull/606.

I was able to validate in my environment that with this fix the exception no 
longer occurs.

> Exception when preparing InfModel after TDB loading operation
> -
>
> Key: JENA-1758
> URL: https://issues.apache.org/jira/browse/JENA-1758
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.12.0
>Reporter: Damien Obrist
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Exception
> I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
> loader (extending {{LoaderMain}}), since the triples being loaded are 
> generated by a separate application and streamed in over HTTP.
> After the loading is done, I try to reset an {{InfModel}} to recompute the 
> inference taking into account the new triples, but I encounter the following 
> exception:
> {noformat}
> org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
> 32834204)32834205
>   at org.apache.jena.atlas.io.IO.exception(IO.java:254)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.checkRead(TransBinaryDataFile.java:190)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:184)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:318)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:311)
>   at org.apache.jena.riot.thrift.wire.RDF_IRI.read(RDF_IRI.java:258)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:319)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:112)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:108)
>   at 
> org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:53)
>   at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>   at 
> org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:36)
>   at 
> org.apache.jena.tdb2.store.IteratorTxnTracker.next(IteratorTxnTracker.java:43)
>   at 
> java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:265)
>   at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:903)
>   at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:192)
>   at 
> org.apache.jena.

[jira] [Commented] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928617#comment-16928617
 ] 

Damien Obrist commented on JENA-1758:
-

h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when sleep for a few seconds after the loading and before preparing the 
model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : already 
closed
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.checkNotClosed(BlockMgrFileAccess.java:78)
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocLimit(BlockMgrFileAccess.java:150)
at 
org.apache.jena.dboe.base.block.BlockMgrWrapper.allocLimit(BlockMgrWrapper.java:87)
at 
org.apache.jena.dboe.base.page.PageBlockMgr.allocLimit(PageBlockMgr.java:45)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:544)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:73)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.commitPrepare(TransactionalComponentLifecycle.java:98)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$executePrepare$12(TransactionCoordinator.java:687)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executePrepare(TransactionCoordinator.java:685)
at 
org.apache.jena.dboe.transaction.txn.Transaction.prepare(Transaction.java:151)
at 
org.apache.jena.dboe.transaction.txn.Transaction.commit(Transaction.java:160)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:139)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-67" java.lang.NullPointerException
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.getDataState(TransactionalComponentLifecycle.java:251)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.abort(TransactionalComponentLifecycle.java:126)
at org.apache.jena.dboe.transaction.txn.SysTrans.abort(SysTrans.java:45)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$null$10(Transaction.java:189)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$abort$$11(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executeAbort(TransactionCoordinator.java:726)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort$(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort(Transaction.java:181)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:142)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This shows that {{DataToTuples}} is the culprit still running after the loading 
operation has returned.

> Exception when preparing InfModel after TDB loading operation
> -
>
> Key: JENA-1758
> URL: https://issues.apache.org/jira/browse/JENA-1758
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.12.0
>Reporter: Damien Obrist
>Priority: Major
>
> h2. Exception
> I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
> loader (extending {{LoaderMain}}), since the triples being loaded are 
> generated by a separate application and streamed in over HTTP.
> After the loading is done, I try to reset an {{InfModel}} to recompute the 
> inference taking into account the new triples, but I encounter the following 
> exception:
> {noformat}
> org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
> 32834204)32834205
>   at org.apache.jena.atlas.io.IO.exception(IO.java:254)
>   at 
> org.apache.jena.dboe.

[jira] [Created] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)
Damien Obrist created JENA-1758:
---

 Summary: Exception when preparing InfModel after TDB loading 
operation
 Key: JENA-1758
 URL: https://issues.apache.org/jira/browse/JENA-1758
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB2
Affects Versions: Jena 3.12.0
Reporter: Damien Obrist


h2. Exception

I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
loader (extending {{LoaderMain}}), since the triples being loaded are generated 
by a separate application and streamed in over HTTP.

After the loading is done, I try to reset an {{InfModel}} to recompute the 
inference taking into account the new triples, but I encounter the following 
exception:

{noformat}
org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
32834204)32834205
at org.apache.jena.atlas.io.IO.exception(IO.java:254)
at 
org.apache.jena.dboe.trans.data.TransBinaryDataFile.checkRead(TransBinaryDataFile.java:190)
at 
org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:184)
at 
org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
at 
org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
at 
org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:318)
at 
org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:311)
at org.apache.jena.riot.thrift.wire.RDF_IRI.read(RDF_IRI.java:258)
at 
org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:319)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
at org.apache.thrift.TUnion.read(TUnion.java:138)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:112)
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:108)
at 
org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:53)
at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
at 
org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:36)
at 
org.apache.jena.tdb2.store.IteratorTxnTracker.next(IteratorTxnTracker.java:43)
at 
java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:265)
at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:903)
at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:192)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at 
org.apache.jena.reasoner.rulesys.impl.RETEEngine.fastInit(RETEEngine.java:155)
at 
org.apache.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:471)
at 
org.apache.jena.rdf.model.impl.InfModelImpl.prepare(InfModelImpl.java:87)
...
{noformat}

h2. Code

I tried to come up with a minimal example but wasn't able to reproduce the 
issue outside of my more involved application environment, where the issue 
occurs consistently. It seems to depend on a specific timing, the number of 
triples being loaded, the inference rules used etc.

The code looks roughly like this:

{code:java}
// initialize
Dataset dataset = TDB2Factory.connectDataset(path

[jira] [Commented] (JENA-1619) TransactionException for large query results

2018-10-29 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667430#comment-16667430
 ] 

Damien Obrist commented on JENA-1619:
-

[~andy.seaborne] thanks for the fix! I can confirm that with the latest 
3.10.0-SNAPSHOT the issue is resolved.

> TransactionException for large query results
> 
>
> Key: JENA-1619
> URL: https://issues.apache.org/jira/browse/JENA-1619
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.9.0
> Environment: MacOS 10.14
> Java 1.8.0_161 (Oracle)
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: jena-transaction-exception-master.zip
>
>
> When querying a TDB2 dataset, I'm observing a suspicious 
> {{TransactionException}}, even though the query results have been copied so 
> they can be used outside of a transaction. The exception only seems to occur 
> when there are many results.
> h3. Reproduction steps
>  * Sample dataset containing 1'000'000 triples
>  * Query all triples:
> {code:java}
> SELECT * WHERE { ?s ?p ?o . }{code}
>  * Copy query results for use outside of transaction:
> {code:java}
> ResultSetFactory.copyResults(queryExecution.execSelect());{code}
>  * Get any variable of the first result
> All variables are {{null}} and the log shows the following exception:
> {noformat}
> org.apache.jena.dboe.transaction.txn.TransactionException: Not in a 
> transaction
>   at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.checkTxn(TransactionalComponentLifecycle.java:417)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:182)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.solver.BindingTDB.get1(BindingTDB.java:126)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:104)
>   at org.apache.jena.sparql.core.ResultBinding._get(ResultBinding.java:57)
>   at 
> org.apache.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:33)
>   at TransactionException.main(TransactionException.java:27)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have created a minimal example available on 
> [GitHub|https://github.com/dobrist/jena-transaction-exception] and attached 
> to this issue.
> h3. Findings
>  * The exception doesn't occur for smaller query results
>  * This seems to be a regression as it works as expected with Jena 3.8.0
>  * Using {{git bisect}} I have identified the commit where this starts to 
> happen: 
> [9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1619) TransactionException for large query results

2018-10-24 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661831#comment-16661831
 ] 

Damien Obrist commented on JENA-1619:
-

You're welcome.

Indeed this is an interesting one :)

This makes sense and would explain the size factor. When I tried to debug why 
the changes in 
[9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]
 would lead to this new behavior I noticed the node table cache and suspected 
it might have something to do with it. But I wasn't able to pinpoint the issue.

> TransactionException for large query results
> 
>
> Key: JENA-1619
> URL: https://issues.apache.org/jira/browse/JENA-1619
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.9.0
> Environment: MacOS 10.14
> Java 1.8.0_161 (Oracle)
>Reporter: Damien Obrist
>Priority: Major
> Attachments: jena-transaction-exception-master.zip
>
>
> When querying a TDB2 dataset, I'm observing a suspicious 
> {{TransactionException}}, even though the query results have been copied so 
> they can be used outside of a transaction. The exception only seems to occur 
> when there are many results.
> h3. Reproduction steps
>  * Sample dataset containing 1'000'000 triples
>  * Query all triples:
> {code:java}
> SELECT * WHERE { ?s ?p ?o . }{code}
>  * Copy query results for use outside of transaction:
> {code:java}
> ResultSetFactory.copyResults(queryExecution.execSelect());{code}
>  * Get any variable of the first result
> All variables are {{null}} and the log shows the following exception:
> {noformat}
> org.apache.jena.dboe.transaction.txn.TransactionException: Not in a 
> transaction
>   at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.checkTxn(TransactionalComponentLifecycle.java:417)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:182)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.solver.BindingTDB.get1(BindingTDB.java:126)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:104)
>   at org.apache.jena.sparql.core.ResultBinding._get(ResultBinding.java:57)
>   at 
> org.apache.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:33)
>   at TransactionException.main(TransactionException.java:27)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have created a minimal example available on 
> [GitHub|https://github.com/dobrist/jena-transaction-exception] and attached 
> to this issue.
> h3. Findings
>  * The exception doesn't occur for smaller query results
>  * This seems to be a regression as it works as expected with Jena 3.8.0
>  * Using {{git bisect}} I have identified the commit where this starts to 
> happen: 
> [9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JENA-1619) TransactionException for large query results

2018-10-23 Thread Damien Obrist (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Obrist updated JENA-1619:

Attachment: jena-transaction-exception-master.zip

> TransactionException for large query results
> 
>
> Key: JENA-1619
> URL: https://issues.apache.org/jira/browse/JENA-1619
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.9.0
> Environment: MacOS 10.14
> Java 1.8.0_161 (Oracle)
>Reporter: Damien Obrist
>Priority: Major
> Attachments: jena-transaction-exception-master.zip
>
>
> When querying a TDB2 dataset, I'm observing a suspicious 
> {{TransactionException}}, even though the query results have been copied so 
> they can be used outside of a transaction. The exception only seems to occur 
> when there are many results.
> h3. Reproduction steps
>  * Sample dataset containing 1'000'000 triples
>  * Query all triples:
> {code:java}
> SELECT * WHERE { ?s ?p ?o . }{code}
>  * Copy query results for use outside of transaction:
> {code:java}
> ResultSetFactory.copyResults(queryExecution.execSelect());{code}
>  * Get any variable of the first result
> All variables are {{null}} and the log shows the following exception:
> {noformat}
> org.apache.jena.dboe.transaction.txn.TransactionException: Not in a 
> transaction
>   at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.checkTxn(TransactionalComponentLifecycle.java:417)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:182)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.solver.BindingTDB.get1(BindingTDB.java:126)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:104)
>   at org.apache.jena.sparql.core.ResultBinding._get(ResultBinding.java:57)
>   at 
> org.apache.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:33)
>   at TransactionException.main(TransactionException.java:27)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have created a minimal example available on 
> [GitHub|https://github.com/dobrist/jena-transaction-exception] and attached 
> to this issue.
> h3. Findings
>  * The exception doesn't occur for smaller query results
>  * This seems to be a regression as it works as expected with Jena 3.8.0
>  * Using {{git bisect}} I have identified the commit where this starts to 
> happen: 
> [9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JENA-1619) TransactionException for large query results

2018-10-23 Thread Damien Obrist (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Obrist updated JENA-1619:

Attachment: (was: jena-transaction-exception-master.zip)

> TransactionException for large query results
> 
>
> Key: JENA-1619
> URL: https://issues.apache.org/jira/browse/JENA-1619
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.9.0
> Environment: MacOS 10.14
> Java 1.8.0_161 (Oracle)
>Reporter: Damien Obrist
>Priority: Major
> Attachments: jena-transaction-exception-master.zip
>
>
> When querying a TDB2 dataset, I'm observing a suspicious 
> {{TransactionException}}, even though the query results have been copied so 
> they can be used outside of a transaction. The exception only seems to occur 
> when there are many results.
> h3. Reproduction steps
>  * Sample dataset containing 1'000'000 triples
>  * Query all triples:
> {code:java}
> SELECT * WHERE { ?s ?p ?o . }{code}
>  * Copy query results for use outside of transaction:
> {code:java}
> ResultSetFactory.copyResults(queryExecution.execSelect());{code}
>  * Get any variable of the first result
> All variables are {{null}} and the log shows the following exception:
> {noformat}
> org.apache.jena.dboe.transaction.txn.TransactionException: Not in a 
> transaction
>   at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.checkTxn(TransactionalComponentLifecycle.java:417)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:182)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.solver.BindingTDB.get1(BindingTDB.java:126)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:104)
>   at org.apache.jena.sparql.core.ResultBinding._get(ResultBinding.java:57)
>   at 
> org.apache.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:33)
>   at TransactionException.main(TransactionException.java:27)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have created a minimal example available on 
> [GitHub|https://github.com/dobrist/jena-transaction-exception] and attached 
> to this issue.
> h3. Findings
>  * The exception doesn't occur for smaller query results
>  * This seems to be a regression as it works as expected with Jena 3.8.0
>  * Using {{git bisect}} I have identified the commit where this starts to 
> happen: 
> [9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (JENA-1619) TransactionException for large query results

2018-10-23 Thread Damien Obrist (JIRA)
Damien Obrist created JENA-1619:
---

 Summary: TransactionException for large query results
 Key: JENA-1619
 URL: https://issues.apache.org/jira/browse/JENA-1619
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB2
Affects Versions: Jena 3.9.0
 Environment: MacOS 10.14
Java 1.8.0_161 (Oracle)
Reporter: Damien Obrist
 Attachments: jena-transaction-exception-master.zip

When querying a TDB2 dataset, I'm observing a suspicious 
{{TransactionException}}, even though the query results have been copied so 
they can be used outside of a transaction. The exception only seems to occur 
when there are many results.
h3. Reproduction steps
 * Sample dataset containing 1'000'000 triples
 * Query all triples:
{code:java}
SELECT * WHERE { ?s ?p ?o . }{code}

 * Copy query results for use outside of transaction:
{code:java}
ResultSetFactory.copyResults(queryExecution.execSelect());{code}

 * Get any variable of the first result

All variables are {{null}} and the log shows the following exception:
{noformat}
org.apache.jena.dboe.transaction.txn.TransactionException: Not in a transaction
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.checkTxn(TransactionalComponentLifecycle.java:417)
at 
org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:182)
at 
org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
at 
org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
at org.apache.thrift.TUnion.read(TUnion.java:138)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
at org.apache.jena.tdb2.solver.BindingTDB.get1(BindingTDB.java:126)
at 
org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:104)
at org.apache.jena.sparql.core.ResultBinding._get(ResultBinding.java:57)
at 
org.apache.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:33)
at TransactionException.main(TransactionException.java:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
{noformat}
I have created a minimal example available on 
[GitHub|https://github.com/dobrist/jena-transaction-exception] and attached to 
this issue.
h3. Findings
 * The exception doesn't occur for smaller query results
 * This seems to be a regression as it works as expected with Jena 3.8.0
 * Using {{git bisect}} I have identified the commit where this starts to 
happen: 
[9a60253|https://github.com/apache/jena/commit/9a60253c00aec6763adbb0253588fe4dc0b484d4]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-15 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650144#comment-16650144
 ] 

Damien Obrist commented on JENA-1615:
-

[~andy.seaborne] got it, thanks for the pointers!

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-12 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647555#comment-16647555
 ] 

Damien Obrist commented on JENA-1615:
-

{quote}I'll merge the PR
{quote}
[~andy.seaborne] thanks a lot, that's great to hear! When is version 3.10.0 
scheduled to be released?

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-10 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644895#comment-16644895
 ] 

Damien Obrist commented on JENA-1615:
-

Hi [~andy.seaborne], thanks for looking into this!

Indeed these files should be closed as well. Being unfamiliar with the code, I 
struggled when I tried to track them down. Thanks for the fix, I have included 
the change in the pull request.

I can confirm that with these fixes, there are no more open file descriptors 
for the old generation after a compaction (after garbage collection).

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643280#comment-16643280
 ] 

Damien Obrist commented on JENA-1615:
-

h3. Investigation

[^open_files_after_compaction_before_gc.png] shows duplicate file descriptors 
for some of the files. Analyzing the Java heap and playing with garbage 
collection, I saw that the first garbage collection after a compaction releases 
some of these file descriptors. This is expected as 
[BlockAccessMapped#_close|https://github.com/apache/jena/blob/3d456654feb2cf7617a85a5245c80b827900076f/jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/file/BlockAccessMapped.java#L249]
 dereferences its {{MappedByteBuffers}}, whose file descriptors are 
subsequently closed by 
[FileChannelImpl#Unmapper|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l784].

[^open_files_after_compaction_after_gc.png] shows that after garbage collection 
there is still an open file descriptor for each TDB file. It seems these are 
the {{FileChannelImpl}}'s [file descriptor 
instances|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l49],
 which seem to be 
[duplicated|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l900]
 when creating the memory mapping. The {{FileChannelImpl}} objects get 
[garbage-collected|https://github.com/apache/jena/blob/3f371dfa952f4af8c2f2511cf4f36e82a56f5789/jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/file/BlockAccessBase.java#L169],
 but their file descriptors are never closed.
h3. Proposed solution

Based on my investigation, I think the {{BlockAccessBase}}'s {{FileChannel}} 
instances need to be closed before they are dereferenced. This would seem to be 
consistent with what is done in TDB1's 
[BlockAccessBase#_close|https://github.com/apache/jena/blob/0d3928eaf449e7b375038a892a6c9c3b0dd05908/jena-tdb/src/main/java/org/apache/jena/tdb/base/file/BlockAccessBase.java#L152].

I have created a pull request with the proposed fix: 
[https://github.com/apache/jena/pull/481]

The change seems to fix the issue: using the mentioned reproduction steps with 
a patched Jena SNAPSHOT shows that all (but two) file descriptors are closed 
after the garbage collection: 
[^open_files_after_compaction_after_gc_with_fix.png]

 

Please let me know what you think and thanks in advance for your feedback!

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Damien Obrist (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Obrist updated JENA-1615:

Attachment: open_files_after_compaction_after_gc_with_fix.png

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Damien Obrist (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Obrist updated JENA-1615:

Description: 
h3. Context

I'm using a TDB2 dataset in a long-running Scala application, in which the 
dataset gets compacted regularly. After compactions, the application removes 
the {{Data-}} folder of the previous generation. However, the corresponding 
disk space isn't properly returned back to the OS, but is still reported as 
being used by {{df}}. Indeed, {{lsof}} shows that the application keeps open 
file descriptors that point to the old generation's files. Only stopping / 
restarting the JVM frees the disk space for good.
h3. Reproduction steps
 * Connect to an existing TDB2 dataset
{code}
val dataset = TDB2Factory.connectDataset("sample"){code}

 * Check open files
  [^open_files_before_compaction.png]
 * Compact the dataset
  {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
 * Check open files (before garbage collection)
 [^open_files_after_compaction_before_gc.png]
 * Check open files (after garbage collection)
 [^open_files_after_compaction_after_gc.png]

The last sceenshot shows that, even after garbage collection, there are still 
open file descriptors pointing to the old generation {{Data-0001}}.
h3. Impact

Depending on how disk usage is being reported, this can be quite problematic. 
In our case, we're running on an OpenShift infrastructure with limited storage. 
After only a handful of compactions, the storage is considered full and cannot 
be used anymore.

  was:
h3. Context

I'm using a TDB2 dataset in a long-running Scala application, in which the 
dataset gets compacted regularly. After compactions, the application removes 
the {{Data-}} folder of the previous generation. However, the corresponding 
disk space isn't properly returned back to the OS, but is still reported as 
being used by {{df}}. Indeed, {{lsof}} shows that the application keeps open 
file descriptors that point to the old generation's files. Only stopping / 
restarting the JVM frees the disk space for good.

h3. Reproduction steps

* Connect to an existing TDB2 dataset
  {code:scala}val dataset = TDB2Factory.connectDataset("sample"){code}
* Check open files
  !open_files_before_compaction.png|thumbnail!
* Compact the dataset
  {code:scala}DatabaseMgr.compact(dataset.asDatasetGraph){code}
* Check open files (before garbage collection)
  !open_files_after_compaction_before_gc.png|thumbnail!
* Check open files (after garbage collection)
  !open_files_after_compaction_after_gc.png|thumbnail!

The last sceenshot shows that, even after garbage collection, there are still 
open file descriptors pointing to the old generation {{Data-0001}}.

h3. Impact

Depending on how disk usage is being reported, this can be quite problematic. 
In our case, we're running on an OpenShift infrastructure with limited storage. 
After only a handful of compactions, the storage is considered full and cannot 
be used anymore.


> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depen

[jira] [Created] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Damien Obrist (JIRA)
Damien Obrist created JENA-1615:
---

 Summary: Compaction leaks file descriptors
 Key: JENA-1615
 URL: https://issues.apache.org/jira/browse/JENA-1615
 Project: Apache Jena
  Issue Type: Bug
  Components: Core, TDB2
Affects Versions: Jena 3.8.0
 Environment: I reproduced the issue on the following environments:
 * OS / Java:
 ** MacOS 10.13.5
Java 1.8.0_161 (Oracle)
 ** Debian 9.5
Java 1.8.0_181 (OpenJDK)
 * Jena version 3.8.0
 * TDB2 mode: mapped
Reporter: Damien Obrist
 Attachments: open_files_after_compaction_after_gc.png, 
open_files_after_compaction_before_gc.png, open_files_before_compaction.png

h3. Context

I'm using a TDB2 dataset in a long-running Scala application, in which the 
dataset gets compacted regularly. After compactions, the application removes 
the {{Data-}} folder of the previous generation. However, the corresponding 
disk space isn't properly returned back to the OS, but is still reported as 
being used by {{df}}. Indeed, {{lsof}} shows that the application keeps open 
file descriptors that point to the old generation's files. Only stopping / 
restarting the JVM frees the disk space for good.

h3. Reproduction steps

* Connect to an existing TDB2 dataset
  {code:scala}val dataset = TDB2Factory.connectDataset("sample"){code}
* Check open files
  !open_files_before_compaction.png|thumbnail!
* Compact the dataset
  {code:scala}DatabaseMgr.compact(dataset.asDatasetGraph){code}
* Check open files (before garbage collection)
  !open_files_after_compaction_before_gc.png|thumbnail!
* Check open files (after garbage collection)
  !open_files_after_compaction_after_gc.png|thumbnail!

The last sceenshot shows that, even after garbage collection, there are still 
open file descriptors pointing to the old generation {{Data-0001}}.

h3. Impact

Depending on how disk usage is being reported, this can be quite problematic. 
In our case, we're running on an OpenShift infrastructure with limited storage. 
After only a handful of compactions, the storage is considered full and cannot 
be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)