On 15/09/12 08:17, Paolo Castagna wrote:
Good catch Stephen!
+1

Ouch! That's nasty.

I've fixed it by removing the event notification. If the SPARQL update touches one of the graphs and the dataset is a kind that is a collection of graph objects, there will be graph events for the graph changed.

In TDB, a dataset is not a collection of graph objects and TDB does not know if there are any graph views so the graph would not be getting graph-generated events anyway.

Either way round it's currently inconsistent.

There was a trigger in TDB to do a sync() after an update on the basic storage DatasetGraphTDB - debugging, it seems it isn't even accessible in TDB 0.9.X.

So overall, I think the functionality can be removed now (which fortunately isn't documented).

I removed the previously deprecated GraphStore operations and deprecated GraphStoreEvents prior to removal. This needs sorting out properly; the first step is to prepare to remove the broken and pointless contract.

        Andy




Paolo
On Sep 14, 2012 9:39 PM, "Stephen Allen" <[email protected]> wrote:

Tracked in JENA-321.

On Fri, Sep 14, 2012 at 11:05 AM, Stephen Allen <[email protected]> wrote:
Michael,

The log actually is very helpful as the stacktrace seems to be the
point where it is using up all the memory (this is not always the
case!).  From what I see, I am guessing your have a very large number
of named graphs in your store.

What appears to be happening is that before the update starts,
UpdateEngineMain attempts to fire notification events to listeners
that an update is about to occur.  Unfortunately, it tries to fire an
event for each named graph in the system.  Because TDB represents
named graphs as quads, the only way to get a list of all the named
graphs to fire an event for is to perform an entire table scan,
project just the graph part of the quad and then perform a distinct
operation.

There are a few problems with this approach:
   1) This is pretty dang inefficient, as the entire database is
scanned on every update query
   2) With a large number of named graphs, you have to fire a lot of
events, which is also inefficient
   3) If you have a lot of named graphs, the distinct operation has to
store every graph name in an in-memory hashset

You are running into issue 3).  The underlying cause seems to be a
mismatch in the design of the graph notification.  This needs to be
redesigned to fire a single event for the entire graphstore.

-Stephen

P.S.  Problematic code is in DatasetGraphTDB.java (line 262).


On Fri, Sep 14, 2012 at 5:12 AM, Michael Brunnbauer <[email protected]>
wrote:

Hello Andy,

On Fri, Sep 14, 2012 at 12:11:41PM +0100, Andy Seaborne wrote:
What I don't understand is where the garbage is coming from.
It may be the queries, and not the update.

The queries are on another TDB. I do nothing with the updated TDB
except the
DROP.

So does the log provide any clues? (Running with -v provides more
details - including the updates).

See the attached log. The exception trace at the end may provide hints.

Regards,

Michael Brunnbauer




Reply via email to