Re: Required Heap size for Fuseki ?

Andy Seaborne Sat, 15 Sep 2012 09:18:20 -0700

On 15/09/12 08:17, Paolo Castagna wrote:

Good catch Stephen!

+1


Ouch! That's nasty.

I've fixed it by removing the event notification. If the SPARQL updatetouches one of the graphs and the dataset is a kind that is a collectionof graph objects, there will be graph events for the graph changed.

In TDB, a dataset is not a collection of graph objects and TDB does notknow if there are any graph views so the graph would not be gettinggraph-generated events anyway.


Either way round it's currently inconsistent.

There was a trigger in TDB to do a sync() after an update on the basicstorage DatasetGraphTDB - debugging, it seems it isn't even accessiblein TDB 0.9.X.

So overall, I think the functionality can be removed now (whichfortunately isn't documented).

I removed the previously deprecated GraphStore operations and deprecatedGraphStoreEvents prior to removal. This needs sorting out properly; thefirst step is to prepare to remove the broken and pointless contract.


        Andy


Paolo
On Sep 14, 2012 9:39 PM, "Stephen Allen" <[email protected]> wrote:

Tracked in JENA-321.

On Fri, Sep 14, 2012 at 11:05 AM, Stephen Allen <[email protected]> wrote:

Michael,

The log actually is very helpful as the stacktrace seems to be the
point where it is using up all the memory (this is not always the
case!).  From what I see, I am guessing your have a very large number
of named graphs in your store.

What appears to be happening is that before the update starts,
UpdateEngineMain attempts to fire notification events to listeners
that an update is about to occur.  Unfortunately, it tries to fire an
event for each named graph in the system.  Because TDB represents
named graphs as quads, the only way to get a list of all the named
graphs to fire an event for is to perform an entire table scan,
project just the graph part of the quad and then perform a distinct
operation.

There are a few problems with this approach:
   1) This is pretty dang inefficient, as the entire database is
scanned on every update query
   2) With a large number of named graphs, you have to fire a lot of
events, which is also inefficient
   3) If you have a lot of named graphs, the distinct operation has to
store every graph name in an in-memory hashset

You are running into issue 3).  The underlying cause seems to be a
mismatch in the design of the graph notification.  This needs to be
redesigned to fire a single event for the entire graphstore.

-Stephen

P.S.  Problematic code is in DatasetGraphTDB.java (line 262).


On Fri, Sep 14, 2012 at 5:12 AM, Michael Brunnbauer <[email protected]>

wrote:


Hello Andy,

On Fri, Sep 14, 2012 at 12:11:41PM +0100, Andy Seaborne wrote:

What I don't understand is where the garbage is coming from.
It may be the queries, and not the update.


The queries are on another TDB. I do nothing with the updated TDB

except the

DROP.

So does the log provide any clues? (Running with -v provides more
details - including the updates).


See the attached log. The exception trace at the end may provide hints.

Regards,

Michael Brunnbauer

Re: Required Heap size for Fuseki ?

Reply via email to