That’s a good idea. One improvement I’ve already made was to relocate the DB to local disk - having it on a shared filesystem is an even worse idea.
The updates tend to be on the order of 5-20 triples at a time. I believe I identified the worst culprit, and that was using OWLFBRuleReasoner rather than RDFSExptRuleReasoner or TransitiveReasoner. My guess is that the longish query chain over a large triplestore, using the Owl reasoner was leading to very long query times and lots of memory consumption. Do you think that’s a reasonable guess? How I reached that conclusion was to kill the non-responsive (even for a small query) Fuseki and restart with RDFSExptRuleReasoner (same DB, with many triples). After that, both the small query and the multi-join query responded quite quickly. If necessary, I’ll try to throttle the posts, since I’m in complete control of the submissions. Thanks, Mark On May 30, 2014, at 12:52 PM, Andy Seaborne <[email protected]> wrote: > Mark, > > How big are the updates? > > An SSD for the database and the journal will help. > > Every transaction is a commit, and a commit is a disk operation to ensure the > commit record is permanent. That is not cheap with a rotational disk (seek > time), and much better with an SSD. > > If you are driving Fuseki as hard as possible, something will break - the > proposal in JENA-703 amounts to slowing the clients down as well as being > more defensive. > > Andy > > On 30/05/14 15:39, Rob Vesse wrote: >> Mark >> >> This sounds like the same problem described in >> https://issues.apache.org/jira/browse/JENA-689 >> >> TL;DR >> >> For a system with no quiescent periods continually receiving updates the >> in-memory journal continues to expand until such time as an OOM occurs. >> There will be little/no data loss because the journal is a write ahead log >> and is first written to disk (you will lose at most the data from the >> transaction that encountered the OOM). Therefore once the system is >> restarted the journal will be replayed and flushed. >> >> See https://issues.apache.org/jira/browse/JENA-567 for an experimental >> feature that may mitigate this and see >> https://issues.apache.org/jira/browse/JENA-703 for the issue tracking the >> work to remove this limitation >> >> Rob >> >> >> On 30/05/2014 14:46, "Mark Feblowitz" <[email protected]> wrote: >> >>> >>> The update rates look to be around 40 updates per second (at least during >>> catch-up, after restarting Fuseki. For regular processing, I’m seeing >>> bursts of many such flurries of updates, at the same time I’m seeing >>> perhaps a dozen queries. >>> >>> Begin forwarded message: >>> >>>> From: Mark Feblowitz <[email protected]> >>>> Subject: Jena/Fuseki/TDB Java heap space and OutOfMemory errors >>>> Date: May 30, 2014 at 9:36:31 AM EDT >>>> To: "[email protected]" <[email protected]> >>>> >>>> I have a setup where there can be many, rapid-fire updates sent to Jena >>>> TDB via UPDATE posts to Fuseki. >>>> >>>> After some amount of time I see a series of messages after update posts >>>> >>>> WARN [xxxxxx] RC = 500 : Java heap space >>>> >>>> And I’m seeing "java.lang.OutOfMemoryError: Java heap space” errors. >>>> >>>> I’ve bumped heap space to 3072M, on a machine that has 32 GB of memory. >>>> >>>> Can this be dealt with, simply by doubling heap space? Or is there >>>> something about how I’m posting that could be leading to the problem. >>>> >>>> What I’m doing is creating an OntModel in the client, which then posts >>>> that model as an update to Fuseki. Fuseki is configured with TDB and >>>> OWLFBRuleReasoner. >>>> >>>> As I mentioned in my prior posts on retries, there are flurries of >>>> posts to Fuseki, and roughly concurrent queries - often on things that >>>> are in the process of being posted. All of the posts come from a single >>>> client, while the queries come from multiple queries initiated from >>>> separately run queries (from separate JVMs). >>>> >>>> >>>> Here’s a snippet of the console log after a series of other >>>> interactions and errors. I’ll have to save out the entire log (but it >>>> will be huge) to identify the circumstances of the first errors. >>>> >>>> >>>> 00:55:33 INFO [79927] 500 Java heap space (5.393 s) >>>> 00:55:39 INFO [79929] POST http://localhost:3030/km4sp/update >>>> 00:55:43 WARN [79929] RC = 500 : Java heap space >>>> java.lang.OutOfMemoryError: Java heap space >>>> at >>>> org.apache.jena.atlas.io.CharStreamBuffered.<init>(CharStreamBuffered >>>> .java:53) >>>> at org.apache.jena.atlas.io.PeekReader.make(PeekReader.java:81) >>>> at org.apache.jena.atlas.io.PeekReader.make(PeekReader.java:75) >>>> at >>>> org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:92) >>>> at >>>> com.hp.hpl.jena.sparql.lang.UpdateParser.parse(UpdateParser.java:51) >>>> at >>>> com.hp.hpl.jena.update.UpdateFactory.make(UpdateFactory.java:278) >>>> at >>>> com.hp.hpl.jena.update.UpdateFactory.read(UpdateFactory.java:267) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_Update.execute(SPARQL_Update.j >>>> ava:228) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_Update.executeBody(SPARQL_Upda >>>> te.java:194) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_Update.perform(SPARQL_Update.j >>>> ava:106) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeLifecycle(S >>>> PARQL_ServletBase.java:171) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeAction(SPAR >>>> QL_ServletBase.java:152) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.execCommonWorker(S >>>> PARQL_ServletBase.java:140) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_ServletBase.doCommon(SPARQL_Se >>>> rvletBase.java:69) >>>> at >>>> org.apache.jena.fuseki.servlets.SPARQL_Update.doPost(SPARQL_Update.ja >>>> va:81) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:755) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) >>>> at >>>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684 >>>> ) >>>> at >>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java >>>> :457) >>>> at >>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl >>>> er.java:229) >>>> at >>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl >>>> er.java:1075) >>>> at >>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java: >>>> 384) >>>> at >>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle >>>> r.java:193) >>>> at >>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle >>>> r.java:1009) >>>> at >>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j >>>> ava:135) >>>> at >>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper >>>> .java:116) >>>> at org.eclipse.jetty.server.Server.handle(Server.java:370) >>>> at >>>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac >>>> tHttpConnection.java:489) >>>> at >>>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin >>>> gHttpConnection.java:53) >>>> at >>>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpC >>>> onnection.java:960) >>>> at >>>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.conten >>>> t(AbstractHttpConnection.java:1021) >>>> at >>>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865) >>>> 00:55:45 INFO [79929] 500 Java heap space (5.597 s) >>>> 00:55:51 INFO [79930] POST http://localhost:3030/km4sp/update >>>> 00:56:59 INFO [79928] exec/select >>>> >>> >> >> >> >> >
