Hi,
I have been working on a client project on which we have used the Jena SDB
implementation (1.3.5 snapshot backed by Oracle) as our triple store.
The basic issue we have is as follows:
During implementation we encountered some fairly severe performance
restrictions during querying -- not during the actual query execution, but
during the subsequent iteration over the resulting triples (we are
primarily using CONSTRUCT). It seems that the iterator is reaching out to
the underlying Oracle instance on nearly every iteration, which, when we
have potentially thousands of triples in the results, is extremely
prohibitive.
The solution we implemented was to pre-load all the triples into a "live"
in-memory model up front, which gave us the performance we needed.
However, we are now approaching a size where loading the entire model is no
longer feasible, in terms of footprint and initial load time. I realize
the models can be broken up in many ways, allowing only partial loads, but
the problem is that we have no real way of knowing what data the system
might be interested in beforehand -- any historical data needs to available
for querying at any time.
My question is whether anyone has encountered this issue before, how they
may have handled it, and whether there is a setting we are missing or
another way to handle this. Without digging into the weeds I can't be
sure, but it seems like the iterator's implementation could be optimized to
significantly reduce the number of calls to the database.
The actual query is irrelevant -- this happens for any construct query.
Here is the basic code that is running slowly (it calls the db in most
iterations of the loop). The Iterator is returned very quickly from the
QueryExecution.execConstructTriples method.
protected void iterate(Iterator<com.hp.hpl.jena.graph.Triple> it)
{
while (it.hasNext())
{
com.hp.hpl.jena.graph.Triple jenaTrip = it.next();
//this call to next is resulting in a call to Oracle through
jdbc
//add the triple to another list
}
}
If SQL logging is turned on, you can easily see the large number of
independent calls as the loop executes.
Thanks,
Aaron
--
Aaron Jackson
Lead Solution Architect
Blue Slate Solutions | Phone: 518.810.0372 | Cell: 845.392.6923
Email: [email protected] | www.blueslate.net