On 01/10/12 10:21, Osma Suominen wrote:
28.09.2012 19:27, Andy Seaborne wrote:
Can you use a bit more heap? The default is just a general default,
including small 32 bit machines.
I have it running at 2G and have executed 75 PUTs and it's still going.
Hi Andy!
Thanks for the quick reply and for testing this yourself. You're right,
I made hasty conclusion and giving more heap to the JVM does seem to
help. I tried using 2GB heap and could run 200 PUTs without problems on
a recent snapshot. So there is indeed no memory leak.
This seems to be a GC issue: if you run many PUTs with a small heap size
the GC doesn't get around to freeing enough memory before it's too late,
despite the sleeping between PUTs. When I watched the process memory
consumption using top in the latest test run, there was a steady rise to
around 2GB and then suddenly 600-700MB is released when the GC kicks in.
This process then repeats every dozen requests or so.
I will see whether tuning the GC parameters would help. It's a bit
frustrating - I'm trying to set up a public SPARQL endpoint on a
dedicated server machine and PUTs are the easiest way to update the data
from outside the server, SOA-style. The server is a 64bit RHEL6 running
Fuseki with 3GB heap and I can easily push it over the edge by accident
with a few relatively small (<1M triples) PUTs. Total physical memory is
4GB, so there's not that much room for increasing the heap size - okay,
I should just get more memory...
-Osma
It's not a GC issue, at least not in the normal low level sense.
Write transactions are batched together for write-back to the main
database after they are committed. They are in the journal on-disk but
also the in-memory structures are retained for access to a view of the
database with the transactions applied. These take memory. (it's the
indexes - the node data is written back in the prepare file because it's
an append-only file).
The batching size is set to 10 - after 10 writes, the system flushes the
journal and drops the in-memory structures. So if you get past that
point, it should go "forever".
And every incoming request is pared in-memory to check validity of the
RDF. Also a source of RAM usage.
What the system should do is:
1/ use a persistent-but-cached layer for completed transactions
2/ be tunable (*)
3/ Notice a store is transactional and use that instead of parsing to an
in-memory graph
but does not currently offer those features. Contributions welcome.
Andy
(*) I have tended to avoid lots of configuration options as I find in
other systems lots of knobs to tweak is unhelpful overall. Either
people use the default or it needs deep magic to control.