TDB Java heap space and OutOfMemory errors

Mark Feblowitz Mon, 02 Jun 2014 11:15:25 -0700

Ok - 

It’s another day… after a weekend’s rest. And now I have a few observations and 
many questions:


So, a bit more about what I’m doing (much of it likely naive): 

I create a local ontmodel, create several statements in that model, convert the 
model to a string, wrap it with INSERT DATA { }, then post it to Fuseki using 
an Update Request:

                        UpdateRequest ur = new UpdateRequest();
                        ur.add(payloadContent);
                                UpdateProcessor up = 
UpdateExecutionFactory.createRemote(ur, service);
                                            up.execute();
                                        } catch (Exception e) {
                                                e.printStackTrace();
                                        } finally { 
                                        } 

I have standalone code that sends SPARQL queries to the same  Fuseki server, 
using the usual methods.
 
So, before I go deeper into the questions, it occurs to me that I don’t 
necessarily need to post remotely, and removing that overhead would be an 
obvious improvement . I’m wondering whether my update code could interact 
directly with TDB while also having the content be queriable via Fuseki. I 
recall trying that a while ago and noting that direct posts to TDB were not 
available for query until restarting Fuseki. Was that a (repaired) bug or a 
feature?

If I can’t do this, do you know of any projects that have created a server that 
can be run with my application and share the TDB connection?

Assuming that I do stick with remote updates, I have some questions:

First, note that I create a new Update processor for each posted update. I 
don’t see any up.close() so I’m guessing I’m not creating any leakage here. 
Would it be better to create one update processor and reuse it across updates?

Next, I’m seeing a lot of notices like the following in the Fuseki console:

13:40:07 INFO  [19125] 204 No Content (21 ms) 
13:40:07 INFO  [19126] POST http://localhost:3030/km4sp/update
13:40:07 INFO  [19126] 204 No Content (13 ms) 
13:40:07 INFO  [19127] POST http://localhost:3030/km4sp/update
13:40:07 INFO  [19127] 204 No Content (16 ms) 
13:40:07 INFO  [19128] POST http://localhost:3030/km4sp/update
13:40:07 INFO  [19128] 204 No Content (14 ms) 
13:40:07 INFO  [19129] POST http://localhost:3030/km4sp/update
13:40:07 INFO  [19129] 204 No Content (8 ms) 
13:40:07 INFO  [19130] POST http://localhost:3030/km4sp/update

I can't tell whether this reflects 
  1) a high overall number of update calls from my code, or 
  2) that the Fuseki side processes each of the triples for an update as an 
http post.

If it’s the latter, I don’t think I can batch up more updates to improve 
overall performance.

I guess I can tell by inserting delays between my posts and capturing 
timestamps in my log that can be compared against what I’m seeing in the Fuseki 
console. I’ll try that next.

I guess that’s it for the questions for now.

Any recommendations?

Thanks again,

Mark


 
On May 31, 2014, at 12:22 PM, Andy Seaborne <[email protected]> wrote:

> Hi Mark,
> 
> The long running query is quite significant.
> 
> On 30/05/14 18:26, Mark Feblowitz wrote:
>> That’s a good idea.
>> 
>> One improvement I’ve already made was to relocate the DB to local
>> disk - having it on a shared filesystem is an even worse idea.
>> 
>> The updates tend to be on the order of 5-20 triples at a time.
> 
> If you could batch up changes, that will help for all sorts of reasons.  c.f. 
> autocommit and JDBC where many small changes run really slowly.
> 
> This is part of the issue - write transactions have a significant fixed
> cost that you get for even a (theoretical) transaction of no changes. It
> has to write a few bytes and do a disk-sync.  Reads continue during this
> time but longer write times means there is less chance of system being
> able to write the journal to the main database.  JENA-567 may help but
> isn't faster (it's slower) but it saves memory.
> 
> Read transactions have near zero cost in TDB - Fuseki/TDB is read-centric.
> 
> What's more, TDB block size is 8Kbytes so one change in a block is 8K of 
> transaction state.  Multiple times for multiple indexes.  So 5 triples of 
> change get every little shared block effect and the memory footprint is 
> disproportionally large.
> 
> <thinking out loud id=1>
> 
> A block size of 1 or 2k for the leaf blocks in TDB, leaving the branch blocks 
> at 8k (they have in different block managers = files) would be worth 
> experimenting with.
> 
> </thinking out loud>
> 
> <thinking out loud id=2>
> 
> We could provide some netty/mima/... based server that did the moral
> equivalent of the SPARQL Protocol (cf jena-jdbc, jena-client). HTTP is said 
> to be an appreciable cost.  This is no judgement of Jetty/tomcat, it is the 
> nature of HTTP; it is cautious wording because I haven't observed it myself - 
> Jetty locally, together with careful streaming of results seems to be quite 
> effective.  Fast encoding of results would be good for both.
> 
> </thinking out loud>
> 
>> I believe I identified the worst culprit, and that was using
>> OWLFBRuleReasoner rather than RDFSExptRuleReasoner or
>> TransitiveReasoner. My guess is that the longish query chain over a
>> large triplestore, using the Owl reasoner was leading to very long
>> query times and lots of memory consumption. Do you think that’s a
>> reasonable guess?
> 
> That does look right.  Long running queries, or the effect of an intense 
> stream of small back-to-back queries combined with the update pattern, leave 
> no time for the system to flush the journal back to the main database.  This 
> leads to memory usage and eventually OOME.
> 
>> How I reached that conclusion was to kill the non-responsive (even
>> for a small query) Fuseki and restart with RDFSExptRuleReasoner (same
>> DB, with many triples). After that, both the small query and the
>> multi-join query responded quite quickly.
>> 
>> If necessary, I’ll try to throttle the posts, since I’m in complete
>> control of the submissions.
> 
> That should at least prove whether this discussion has correctly diagnosed 
> the interactions leading to OOME.  What we have is ungrace-ful 
> ("disgraceful") behaviour as the load reaches system saturation.  It ought to 
> be more graceful but, fundamentally, its always going to be possible to flood 
> a system, any system, with more work than it is capable of.
> 
>       Andy
>> 
>> Thanks,
>> 
>> Mark
>> 
>> On May 30, 2014, at 12:52 PM, Andy Seaborne <[email protected]> wrote:
>> 
>>> Mark,
>>> 
>>> How big are the updates?
>>> 
>>> An SSD for the database and the journal will help.
>>> 
>>> Every transaction is a commit, and a commit is a disk operation to
>>> ensure the commit record is permanent.  That is not cheap with a
>>> rotational disk (seek time), and much better with an SSD.
>>> 
>>> If you are driving Fuseki as hard as possible, something will break
>>> - the proposal in JENA-703 amounts to slowing the clients down as
>>> well as being more defensive.
>>> 
>>> Andy
>>> 
>>> On 30/05/14 15:39, Rob Vesse wrote:
>>>> Mark
>>>> 
>>>> This sounds like the same problem described in
>>>> https://issues.apache.org/jira/browse/JENA-689
>>>> 
>>>> TL;DR
>>>> 
>>>> For a system with no quiescent periods continually receiving
>>>> updates the in-memory journal continues to expand until such time
>>>> as an OOM occurs. There will be little/no data loss because the
>>>> journal is a write ahead log and is first written to disk (you
>>>> will lose at most the data from the transaction that encountered
>>>> the OOM).  Therefore once the system is restarted the journal
>>>> will be replayed and flushed.
>>>> 
>>>> See https://issues.apache.org/jira/browse/JENA-567 for an
>>>> experimental feature that may mitigate this and see
>>>> https://issues.apache.org/jira/browse/JENA-703 for the issue
>>>> tracking the work to remove this limitation
>>>> 
>>>> Rob
>

Re: Jena/Fuseki/TDB Java heap space and OutOfMemory errors

Reply via email to