On 04/02/16 08:15, Jean-Marc Vanel wrote:
Sorry for being vague.
The RAM usage is growing, until crashing with an Out Of Memery exception.
TDB uses a bounded amount of caching, though the journal workspace can grow.
If there are lots of large literals, you'll need more heap.
The transaction system in TDB1 keeps up to 10 transaction buffered : you
can switch that off with:
TransactionManager.QueueBatchSize = 0 ;
then commits are flushed back to the main database as soon as possible.
That needs no readers about.
If you have a reader that doesn't commit/end properly, the system can
never write to the main database.
If you have a system where there are always readers, it will grow but
you don't have that setup if the below is true:
AFAIK transactions occur on the same thread started by the Play! framework
and so do not overlap.
About the "pattern of transactions" , I don't know what to answer. I there
was a questionnaire I'd be glad to answer. Also I can instrument the code
if there is some procedure.
It is running with java version "1.8.0_65" , on Ubuntu 15.10 .
The test I'm going to do is to call close() and refresh the Dataset when
reaching 80% of the maximum memory .
It is a disk-backed dataset, not an TDB memory one?
Andy
2016-02-03 23:05 GMT+01:00 Andy Seaborne <[email protected]>:
Hi there -
"memory leak" has possible several meaning, not sure which you you mean:
* RAM usage is growing?
* Disk usage is growing?
* a specific file (the journal is growing)?
What is the pattern of transactions? (how many, do they overlap?)
Andy
On 03/02/16 17:47, Jean-Marc Vanel wrote:
I forgot to mention that I'm still using Jena 2.13.0 , due to Banana-RDF
not having updated.
2016-02-03 18:43 GMT+01:00 Jean-Marc Vanel <[email protected]>:
I think that the second pattern "create a dataset object on the thread",
or rather in my case
"create a dataset object for one HTTP request"
is worth trying.
And I want to know why the doc seems to prefer the first pattern.
2016-02-03 18:30 GMT+01:00 A. Soroka <[email protected]>:
On Feb 3, 2016, at 5:13 AM, Jean-Marc Vanel <[email protected]>
wrote:
In the documentation,
https://jena.apache.org/documentation/tdb/tdb_transactions.html#multi-threaded-use
it is not clear which use pattern is preferred and the reason why.
The first pattern shows a single dataset object being shared between
threads, each of which operates a transaction against that object, and
the
second pattern is introduced with "or create a dataset object on the
thread
(the case above is preferred):”.
As to why, I am not familiar enough with TDB to be sure, but there is a
comment on the second pattern "Each thread has a separate dataset
object;
these safely share the same storage but have independent transactions.”
that would seem to indicate that the second pattern is vulnerable to
having
conflicts between transactions opened against the two different dataset
objects.
---
A. Soroka
The University of Virginia Library
On Feb 3, 2016, at 5:13 AM, Jean-Marc Vanel <[email protected]>
wrote:
I have a repeating memory leak in TDB in my web application (
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/README.md
).
It is caching RDF documents from internet, typically dbpedia
ressources.
It is not the use case described in "Fuseki/TDB memory leak for
concurrent
updates/queries" https://issues.apache.org/jira/browse/JENA-689 , as
the
journal is empty after crash .
A single Dataset object is used for the duration of the application,
and I
suspect this is the root cause.
In the documentation,
https://jena.apache.org/documentation/tdb/tdb_transactions.html#multi-threaded-use
it is not clear which use pattern is preferred and the reason why.
You someone confirm that keeping a single Dataset object for the
duration
of the application is bad ?
--
Jean-Marc Vanel
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
http://deductions-software.com/
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui