That’s a good idea. 

One improvement I’ve already made was to relocate the DB to local disk - having 
it on a shared filesystem is an even worse idea.

The updates tend to be on the order of 5-20 triples at a time.

I believe I identified the worst culprit, and that was using OWLFBRuleReasoner 
rather than RDFSExptRuleReasoner or TransitiveReasoner. My guess is that the 
longish query chain over a large triplestore, using the Owl reasoner was 
leading to very long query times and lots of memory consumption. Do you think 
that’s a reasonable guess?

How I reached that conclusion was to kill the non-responsive (even for a small 
query) Fuseki and restart with RDFSExptRuleReasoner (same DB, with many 
triples). After that, both the small query and the multi-join query responded 
quite quickly.

If necessary, I’ll try to throttle the posts, since I’m in complete control of 
the submissions.

Thanks,

Mark

On May 30, 2014, at 12:52 PM, Andy Seaborne <[email protected]> wrote:

> Mark,
> 
> How big are the updates?
> 
> An SSD for the database and the journal will help.
> 
> Every transaction is a commit, and a commit is a disk operation to ensure the 
> commit record is permanent.  That is not cheap with a rotational disk (seek 
> time), and much better with an SSD.
> 
> If you are driving Fuseki as hard as possible, something will break - the 
> proposal in JENA-703 amounts to slowing the clients down as well as being 
> more defensive.
> 
>       Andy
> 
> On 30/05/14 15:39, Rob Vesse wrote:
>> Mark
>> 
>> This sounds like the same problem described in
>> https://issues.apache.org/jira/browse/JENA-689
>> 
>> TL;DR
>> 
>> For a system with no quiescent periods continually receiving updates the
>> in-memory journal continues to expand until such time as an OOM occurs.
>> There will be little/no data loss because the journal is a write ahead log
>> and is first written to disk (you will lose at most the data from the
>> transaction that encountered the OOM).  Therefore once the system is
>> restarted the journal will be replayed and flushed.
>> 
>> See https://issues.apache.org/jira/browse/JENA-567 for an experimental
>> feature that may mitigate this and see
>> https://issues.apache.org/jira/browse/JENA-703 for the issue tracking the
>> work to remove this limitation
>> 
>> Rob
>> 
>> 
>> On 30/05/2014 14:46, "Mark Feblowitz" <[email protected]> wrote:
>> 
>>> 
>>> The update rates look to be around 40 updates per second (at least during
>>> catch-up, after restarting Fuseki. For regular processing, I’m seeing
>>> bursts of many such flurries of updates, at the same time I’m seeing
>>> perhaps a dozen queries.
>>> 
>>> Begin forwarded message:
>>> 
>>>> From: Mark Feblowitz <[email protected]>
>>>> Subject: Jena/Fuseki/TDB Java heap space and OutOfMemory errors
>>>> Date: May 30, 2014 at 9:36:31 AM EDT
>>>> To: "[email protected]" <[email protected]>
>>>> 
>>>> I have a setup where there can be many, rapid-fire updates sent to Jena
>>>> TDB via UPDATE posts to Fuseki.
>>>> 
>>>> After some amount of time I see a series of messages after update posts
>>>> 
>>>>    WARN  [xxxxxx] RC = 500 : Java heap space
>>>> 
>>>> And I’m seeing "java.lang.OutOfMemoryError: Java heap space”  errors.
>>>> 
>>>> I’ve bumped heap space to 3072M, on a machine that has 32 GB of memory.
>>>> 
>>>> Can this be dealt with, simply by doubling heap space? Or is there
>>>> something about how I’m posting that could be leading to the problem.
>>>> 
>>>> What I’m doing is creating an OntModel in the client, which then posts
>>>> that model as an update to Fuseki. Fuseki is configured with TDB and
>>>> OWLFBRuleReasoner.
>>>> 
>>>> As I mentioned in my prior posts on retries, there are flurries of
>>>> posts to Fuseki, and roughly concurrent queries - often on things that
>>>> are in the process of being posted. All of the posts come from a single
>>>> client, while the queries come from multiple queries initiated from
>>>> separately run queries (from separate JVMs).
>>>> 
>>>> 
>>>> Here’s a snippet of the console log after a series of other
>>>> interactions and errors. I’ll have to save out the entire log (but it
>>>> will be huge) to identify the circumstances of the first errors.
>>>> 
>>>> 
>>>> 00:55:33 INFO  [79927] 500 Java heap space (5.393 s)
>>>> 00:55:39 INFO  [79929] POST http://localhost:3030/km4sp/update
>>>> 00:55:43 WARN  [79929] RC = 500 : Java heap space
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>        at
>>>> org.apache.jena.atlas.io.CharStreamBuffered.<init>(CharStreamBuffered
>>>> .java:53)
>>>>        at org.apache.jena.atlas.io.PeekReader.make(PeekReader.java:81)
>>>>        at org.apache.jena.atlas.io.PeekReader.make(PeekReader.java:75)
>>>>        at
>>>> org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92)
>>>>        at
>>>> com.hp.hpl.jena.sparql.lang.UpdateParser.parse(UpdateParser.java:51)
>>>>        at
>>>> com.hp.hpl.jena.update.UpdateFactory.make(UpdateFactory.java:278)
>>>>        at
>>>> com.hp.hpl.jena.update.UpdateFactory.read(UpdateFactory.java:267)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_Update.execute(SPARQL_Update.j
>>>> ava:228)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Upda
>>>> te.java:194)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.j
>>>> ava:106)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeLifecycle(S
>>>> PARQL_ServletBase.java:171)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeAction(SPAR
>>>> QL_ServletBase.java:152)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.execCommonWorker(S
>>>> PARQL_ServletBase.java:140)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.doCommon(SPARQL_Se
>>>> rvletBase.java:69)
>>>>        at
>>>> org.apache.jena.fuseki.servlets.SPARQL_Update.doPost(SPARQL_Update.ja
>>>> va:81)
>>>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
>>>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>>>        at
>>>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684
>>>> )
>>>>        at
>>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
>>>> :457)
>>>>        at
>>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
>>>> er.java:229)
>>>>        at
>>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
>>>> er.java:1075)
>>>>        at
>>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
>>>> 384)
>>>>        at
>>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
>>>> r.java:193)
>>>>        at
>>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
>>>> r.java:1009)
>>>>        at
>>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>>>> ava:135)
>>>>        at
>>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
>>>> .java:116)
>>>>        at org.eclipse.jetty.server.Server.handle(Server.java:370)
>>>>        at
>>>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
>>>> tHttpConnection.java:489)
>>>>        at
>>>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
>>>> gHttpConnection.java:53)
>>>>        at
>>>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpC
>>>> onnection.java:960)
>>>>        at
>>>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.conten
>>>> t(AbstractHttpConnection.java:1021)
>>>>        at
>>>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
>>>> 00:55:45 INFO  [79929] 500 Java heap space (5.597 s)
>>>> 00:55:51 INFO  [79930] POST http://localhost:3030/km4sp/update
>>>> 00:56:59 INFO  [79928] exec/select
>>>> 
>>> 
>> 
>> 
>> 
>> 
> 

Reply via email to