Re: Fuseki leaks memory on PUT requests

Andy Seaborne Mon, 01 Oct 2012 13:33:42 -0700

On 01/10/12 10:21, Osma Suominen wrote:

28.09.2012 19:27, Andy Seaborne wrote:

Can you use a bit more heap?  The default is just a general default,
including small 32 bit machines.

I have it running at 2G and have executed 75 PUTs and it's still going.


Hi Andy!

Thanks for the quick reply and for testing this yourself. You're right,
I made hasty conclusion and giving more heap to the JVM does seem to
help. I tried using 2GB heap and could run 200 PUTs without problems on
a recent snapshot. So there is indeed no memory leak.

This seems to be a GC issue: if you run many PUTs with a small heap size
the GC doesn't get around to freeing enough memory before it's too late,
despite the sleeping between PUTs. When I watched the process memory
consumption using top in the latest test run, there was a steady rise to
around 2GB and then suddenly 600-700MB is released when the GC kicks in.
This process then repeats every dozen requests or so.

I will see whether tuning the GC parameters would help. It's a bit
frustrating - I'm trying to set up a public SPARQL endpoint on a
dedicated server machine and PUTs are the easiest way to update the data
from outside the server, SOA-style. The server is a 64bit RHEL6 running
Fuseki with 3GB heap and I can easily push it over the edge by accident
with a few relatively small (<1M triples) PUTs. Total physical memory is
4GB, so there's not that much room for increasing the heap size - okay,
I should just get more memory...

-Osma


It's not a GC issue, at least not in the normal low level sense.

Write transactions are batched together for write-back to the maindatabase after they are committed. They are in the journal on-disk butalso the in-memory structures are retained for access to a view of thedatabase with the transactions applied. These take memory. (it's theindexes - the node data is written back in the prepare file because it'san append-only file).

The batching size is set to 10 - after 10 writes, the system flushes thejournal and drops the in-memory structures. So if you get past thatpoint, it should go "forever".

And every incoming request is pared in-memory to check validity of theRDF. Also a source of RAM usage.


What the system should do is:
1/ use a persistent-but-cached layer for completed transactions
2/ be tunable (*)

3/ Notice a store is transactional and use that instead of parsing to anin-memory graph


but does not currently offer those features.   Contributions welcome.

        Andy

(*) I have tended to avoid lots of configuration options as I find inother systems lots of knobs to tweak is unhelpful overall. Eitherpeople use the default or it needs deep magic to control.

Re: Fuseki leaks memory on PUT requests

Reply via email to