Re: Fuseki leaks memory on PUT requests

Osma Suominen Tue, 02 Oct 2012 00:33:40 -0700

Hi Andy!

01.10.2012 23:33, Andy Seaborne kirjoitti:

It's not a GC issue, at least not in the normal low level sense.

Write transactions are batched together for write-back to the main
database after they are committed. They are in the journal on-disk but
also the in-memory structures are retained for access to a view of the
database with the transactions applied.  These take memory.  (it's the
indexes - the node data is written back in the prepare file because it's
an append-only file).

The batching size is set to 10 - after 10 writes, the system flushes the
journal and drops the in-memory structures.  So if you get past that
point, it should go "forever".

And every incoming request is pared in-memory to check validity of the
RDF.  Also a source of RAM usage.

Ah, thanks a lot! Now I understand what I was seeing. When I PUT several(but <10) datasets, Fuseki will temporarily eat a lot of memory. And nowmy problem is that for my datasets, this is more than the available heap.

I understand that batching is performed for performance reasons (I justread JENA-256), but in my scenario, writes (using PUT) are usuallyrather big and infrequent (so write performance is not important, or atleast not much helped by batching) except when I sometimes want toupdate every dataset in one go, so there will be several large PUTs andFuseki will run out of heap unless I restart it in between the PUTs.

What the system should do is:
1/ use a persistent-but-cached layer for completed transactions
2/ be tunable (*)
3/ Notice a store is transactional and use that instead of parsing to an
in-memory graph

but does not currently offer those features.   Contributions welcome.

        Andy

(*) I have tended to avoid lots of configuration options as I find in
other systems lots of knobs to tweak is unhelpful overall.  Either
people use the default or it needs deep magic to control.

I understand, nothing is perfect and there are always possibleimprovements to be made. And also I understand the aversion of knobs.


In my case, I would like to see in Fuseki and/or TDB a way to either
1) reduce the batch size to something less than 10 (say, 2 or 5),
2) turn off batching completely,

3) make batching behavior dependent on the size (in triples ormegabytes) of the accumulated queue, so a queue of large writes would beflushed sooner than a queue of small writes, or4) make batching behavior dependent on time, so that if no furtherwrites are performed in a certain time (say, 10 seconds or a minute)then the flushing will be done regardless of the size of the accumulatedwrite queue

I guess 1 or 2 would be in the tunable category, while 3 and 4 wouldmaybe qualify as deep magic :)

But now that I understand what's happening I can at least work aroundthe problem.


-Osma


--
Osma Suominen | [email protected] | +358 40 5255 882

Aalto University, Department of Media Technology, Semantic ComputingResearch GroupRoom 2541, Otaniementie 17, Espoo, Finland; P.O. Box 15500, FI-00076Aalto, Finland

Re: Fuseki leaks memory on PUT requests

Reply via email to