Re: SDB Performance Issue

Rob Vesse Tue, 19 Mar 2013 16:00:11 -0700

Hi Aaron

Here you are encountering a common misconception among users which we as
developers clearly need to do better at covering in the documentation
which is that calling execBlah() on a query execution actually fully
executes the query.  When in fact what execBlah() does varies according to
the exact variant called.


In any of the cases where you are receiving some form of iterator in
response that iterator is essentially just a plan for how to execute that
query, only when you start iterating over the iterator does any work get
done.


Depending on the SDB layout, backend and SPARQL query used SDB may have to
translate your SPARQL query into arbitrarily many SQL queries because much
of the work often cannot be pushed off to the database level.  It sounds
like this is what you are seeing in your scenario.

Rob

On 3/19/13 1:51 PM, "Aaron Jackson" <[email protected]> wrote:

>Hi,
>
>I have been working on a client project on which we have used the Jena SDB
>implementation (1.3.5 snapshot backed by Oracle) as our triple store.
>
>The basic issue we have is as follows:
>
>During implementation we encountered some fairly severe performance
>restrictions during querying -- not during the actual query execution, but
>during the subsequent iteration over the resulting triples (we are
>primarily using CONSTRUCT).  It seems that the iterator is reaching out to
>the underlying Oracle instance on nearly every iteration, which, when we
>have potentially thousands of triples in the results, is extremely
>prohibitive.
>
>The solution we implemented was to pre-load all the triples into a "live"
>in-memory model up front, which gave us the performance we needed.
>However, we are now approaching a size where loading the entire model is
>no
>longer feasible, in terms of footprint and initial load time.  I realize
>the models can be broken up in many ways, allowing only partial loads, but
>the problem is that we have no real way of knowing what data the system
>might be interested in beforehand -- any historical data needs to
>available
>for querying at any time.
>
>My question is whether anyone has encountered this issue before, how they
>may have handled it, and whether there is a setting we are missing or
>another way to handle this.  Without digging into the weeds I can't be
>sure, but it seems like the iterator's implementation could be optimized
>to
>significantly reduce the number of calls to the database.
>
>The actual query is irrelevant -- this happens for any construct query.
>
>Here is the basic code that is running slowly (it calls the db in most
>iterations of the loop).  The Iterator is returned very quickly from the
>QueryExecution.execConstructTriples method.
>
>protected void iterate(Iterator<com.hp.hpl.jena.graph.Triple> it)
>    {
>        while (it.hasNext())
>        {
>            com.hp.hpl.jena.graph.Triple jenaTrip = it.next();
>            //this call to next is resulting in a call to Oracle through
>jdbc
>            //add the triple to another list
>        }
>    }
>
>If SQL logging is turned on, you can easily see the large number of
>independent calls as the loop executes.
>
>Thanks,
>
>Aaron
>
>
>-- 
>Aaron Jackson
>Lead Solution Architect
>Blue Slate Solutions | Phone: 518.810.0372 | Cell: 845.392.6923
>Email: [email protected] | www.blueslate.net

Re: SDB Performance Issue

Reply via email to