Re: SDB Performance Issue

Aaron Jackson Tue, 19 Mar 2013 19:45:53 -0700

Thank you Rob.  So, the results of subsequent calls to next() are dependent
on previous calls?  Is there any way to execute "in bulk" or as a batch to
reduce the number of db calls?


I'm trying to find a balance between loading the entire triple store into
memory and leaving everything in the database.  We have implemented an LRU
cache to further reduce the number of queries, but there are still cases
where a fairly complex query will run (with good time in memory) but which
takes an hour or longer to process against the db.

On another note, the problem could be significantly eased by being able to
load the entire store more efficiently.  Right now we have about 96 MB of
data (maybe about 100,000 triples?).  We are loading them into a Model by
just calling execConstruct() on a simple "select ?s ?p ?o", resulting in a
returned Model.  It takes about 7 minutes with the current data set.  Is
that call resulting in numerous calls to the database as well?  Do you have
any recommendations for the fastest way to load all triples in a named
model from the database into memory?

Thanks again,

Aaron

On Tue, Mar 19, 2013 at 6:49 PM, Rob Vesse <[email protected]> wrote:

> Hi Aaron
>
> Here you are encountering a common misconception among users which we as
> developers clearly need to do better at covering in the documentation
> which is that calling execBlah() on a query execution actually fully
> executes the query.  When in fact what execBlah() does varies according to
> the exact variant called.
>
> In any of the cases where you are receiving some form of iterator in
> response that iterator is essentially just a plan for how to execute that
> query, only when you start iterating over the iterator does any work get
> done.
>
>
> Depending on the SDB layout, backend and SPARQL query used SDB may have to
> translate your SPARQL query into arbitrarily many SQL queries because much
> of the work often cannot be pushed off to the database level.  It sounds
> like this is what you are seeing in your scenario.
>
> Rob
>
> On 3/19/13 1:51 PM, "Aaron Jackson" <[email protected]> wrote:
>
> >Hi,
> >
> >I have been working on a client project on which we have used the Jena SDB
> >implementation (1.3.5 snapshot backed by Oracle) as our triple store.
> >
> >The basic issue we have is as follows:
> >
> >During implementation we encountered some fairly severe performance
> >restrictions during querying -- not during the actual query execution, but
> >during the subsequent iteration over the resulting triples (we are
> >primarily using CONSTRUCT).  It seems that the iterator is reaching out to
> >the underlying Oracle instance on nearly every iteration, which, when we
> >have potentially thousands of triples in the results, is extremely
> >prohibitive.
> >
> >The solution we implemented was to pre-load all the triples into a "live"
> >in-memory model up front, which gave us the performance we needed.
> >However, we are now approaching a size where loading the entire model is
> >no
> >longer feasible, in terms of footprint and initial load time.  I realize
> >the models can be broken up in many ways, allowing only partial loads, but
> >the problem is that we have no real way of knowing what data the system
> >might be interested in beforehand -- any historical data needs to
> >available
> >for querying at any time.
> >
> >My question is whether anyone has encountered this issue before, how they
> >may have handled it, and whether there is a setting we are missing or
> >another way to handle this.  Without digging into the weeds I can't be
> >sure, but it seems like the iterator's implementation could be optimized
> >to
> >significantly reduce the number of calls to the database.
> >
> >The actual query is irrelevant -- this happens for any construct query.
> >
> >Here is the basic code that is running slowly (it calls the db in most
> >iterations of the loop).  The Iterator is returned very quickly from the
> >QueryExecution.execConstructTriples method.
> >
> >protected void iterate(Iterator<com.hp.hpl.jena.graph.Triple> it)
> >    {
> >        while (it.hasNext())
> >        {
> >            com.hp.hpl.jena.graph.Triple jenaTrip = it.next();
> >            //this call to next is resulting in a call to Oracle through
> >jdbc
> >            //add the triple to another list
> >        }
> >    }
> >
> >If SQL logging is turned on, you can easily see the large number of
> >independent calls as the loop executes.
> >
> >Thanks,
> >
> >Aaron
> >
> >
> >--
> >Aaron Jackson
> >Lead Solution Architect
> >Blue Slate Solutions | Phone: 518.810.0372 | Cell: 845.392.6923
> >Email: [email protected] | www.blueslate.net
>
>


-- 
Aaron Jackson
Lead Solution Architect
Blue Slate Solutions | Phone: 518.810.0372 | Cell: 845.392.6923
Email: [email protected] | www.blueslate.net

Re: SDB Performance Issue

Reply via email to