On 01/10/12 10:21, Osma Suominen wrote:
28.09.2012 19:27, Andy Seaborne wrote:

Can you use a bit more heap?  The default is just a general default,
including small 32 bit machines.

I have it running at 2G and have executed 75 PUTs and it's still going.

Hi Andy!

Thanks for the quick reply and for testing this yourself. You're right,
I made hasty conclusion and giving more heap to the JVM does seem to
help. I tried using 2GB heap and could run 200 PUTs without problems on
a recent snapshot. So there is indeed no memory leak.

This seems to be a GC issue: if you run many PUTs with a small heap size
the GC doesn't get around to freeing enough memory before it's too late,
despite the sleeping between PUTs. When I watched the process memory
consumption using top in the latest test run, there was a steady rise to
around 2GB and then suddenly 600-700MB is released when the GC kicks in.
This process then repeats every dozen requests or so.

I will see whether tuning the GC parameters would help. It's a bit
frustrating - I'm trying to set up a public SPARQL endpoint on a
dedicated server machine and PUTs are the easiest way to update the data
from outside the server, SOA-style. The server is a 64bit RHEL6 running
Fuseki with 3GB heap and I can easily push it over the edge by accident
with a few relatively small (<1M triples) PUTs. Total physical memory is
4GB, so there's not that much room for increasing the heap size - okay,
I should just get more memory...

-Osma


It's not a GC issue, at least not in the normal low level sense.

Write transactions are batched together for write-back to the main database after they are committed. They are in the journal on-disk but also the in-memory structures are retained for access to a view of the database with the transactions applied. These take memory. (it's the indexes - the node data is written back in the prepare file because it's an append-only file).

The batching size is set to 10 - after 10 writes, the system flushes the journal and drops the in-memory structures. So if you get past that point, it should go "forever".

And every incoming request is pared in-memory to check validity of the RDF. Also a source of RAM usage.

What the system should do is:
1/ use a persistent-but-cached layer for completed transactions
2/ be tunable (*)
3/ Notice a store is transactional and use that instead of parsing to an in-memory graph

but does not currently offer those features.   Contributions welcome.

        Andy

(*) I have tended to avoid lots of configuration options as I find in other systems lots of knobs to tweak is unhelpful overall. Either people use the default or it needs deep magic to control.

Reply via email to