TDB Java heap space and OutOfMemory errors

Andy Seaborne Sat, 31 May 2014 13:45:06 -0700

"No content" i.e HTTP status code 204 is what you get when you do an

HTTP operation and there is no body to the reply. It's fine - it'ssame as 200 + information that there is no bytes in the body.


You get it from, say, POSTing a SPARQL Update that completes successfully.


        Andy

On 31/05/14 20:21, Mark Feblowitz wrote:

Ok. Come Monday I'll take a look at what's happening there. It seems
that each update I'm sending is exploding to quite a few posts.
Something odd about the posts - something about "no content" is
reported for each (I'll have to look later and send the actual log
message).

Good to hear that reads are mostly free.

More later.

Sent from my iPhone

On May 31, 2014, at 12:22 PM, Andy Seaborne <[email protected]>
wrote:

Hi Mark,

The long running query is quite significant.

On 30/05/14 18:26, Mark Feblowitz wrote: That’s a good idea.

One improvement I’ve already made was to relocate the DB to
local disk - having it on a shared filesystem is an even worse
idea.

The updates tend to be on the order of 5-20 triples at a time.


If you could batch up changes, that will help for all sorts of
reasons.  c.f. autocommit and JDBC where many small changes run
really slowly.

This is part of the issue - write transactions have a significant
fixed cost that you get for even a (theoretical) transaction of no
changes. It has to write a few bytes and do a disk-sync.  Reads
continue during this time but longer write times means there is
less chance of system being able to write the journal to the main
database.  JENA-567 may help but isn't faster (it's slower) but it
saves memory.

Read transactions have near zero cost in TDB - Fuseki/TDB is
read-centric.

What's more, TDB block size is 8Kbytes so one change in a block is
8K of transaction state.  Multiple times for multiple indexes.  So
5 triples of change get every little shared block effect and the
memory footprint is disproportionally large.

<thinking out loud id=1>

A block size of 1 or 2k for the leaf blocks in TDB, leaving the
branch blocks at 8k (they have in different block managers = files)
would be worth experimenting with.

</thinking out loud>

<thinking out loud id=2>

We could provide some netty/mima/... based server that did the
moral equivalent of the SPARQL Protocol (cf jena-jdbc,
jena-client). HTTP is said to be an appreciable cost.  This is no
judgement of Jetty/tomcat, it is the nature of HTTP; it is cautious
wording because I haven't observed it myself - Jetty locally,
together with careful streaming of results seems to be quite
effective.  Fast encoding of results would be good for both.

</thinking out loud>

I believe I identified the worst culprit, and that was using
OWLFBRuleReasoner rather than RDFSExptRuleReasoner or
TransitiveReasoner. My guess is that the longish query chain over
a large triplestore, using the Owl reasoner was leading to very
long query times and lots of memory consumption. Do you think
that’s a reasonable guess?


That does look right.  Long running queries, or the effect of an
intense stream of small back-to-back queries combined with the
update pattern, leave no time for the system to flush the journal
back to the main database.  This leads to memory usage and
eventually OOME.

How I reached that conclusion was to kill the non-responsive
(even for a small query) Fuseki and restart with
RDFSExptRuleReasoner (same DB, with many triples). After that,
both the small query and the multi-join query responded quite
quickly.

If necessary, I’ll try to throttle the posts, since I’m in
complete control of the submissions.


That should at least prove whether this discussion has correctly
diagnosed the interactions leading to OOME.  What we have is
ungrace-ful ("disgraceful") behaviour as the load reaches system
saturation.  It ought to be more graceful but, fundamentally, its
always going to be possible to flood a system, any system, with
more work than it is capable of.

Andy


Thanks,

Mark

On May 30, 2014, at 12:52 PM, Andy Seaborne <[email protected]>
wrote:

Mark,

How big are the updates?

An SSD for the database and the journal will help.

Every transaction is a commit, and a commit is a disk operation
to ensure the commit record is permanent.  That is not cheap
with a rotational disk (seek time), and much better with an
SSD.

If you are driving Fuseki as hard as possible, something will
break - the proposal in JENA-703 amounts to slowing the clients
down as well as being more defensive.

Andy

On 30/05/14 15:39, Rob Vesse wrote: Mark

This sounds like the same problem described in
https://issues.apache.org/jira/browse/JENA-689

TL;DR

For a system with no quiescent periods continually receiving
updates the in-memory journal continues to expand until such
time as an OOM occurs. There will be little/no data loss
because the journal is a write ahead log and is first written
to disk (you will lose at most the data from the transaction
that encountered the OOM).  Therefore once the system is
restarted the journal will be replayed and flushed.

See https://issues.apache.org/jira/browse/JENA-567 for an
experimental feature that may mitigate this and see
https://issues.apache.org/jira/browse/JENA-703 for the issue
tracking the work to remove this limitation

Rob

Re: Jena/Fuseki/TDB Java heap space and OutOfMemory errors

Reply via email to