On 09/06/16 16:28, Zen 98052 wrote:
Hi,
I have a Sparql query below, which doesn't seem efficient.
I noticed when running it, Jena calls execute(OpBGP opBGP,
QueryIterator ...) so many times.
The default execution strategy - i.e. for in-memory use - is just that -
a default.
It
If your storage layer has different characteristics, e.g. there is a
certain about of overhead to go and get data, then the default execution
strategy maybe the wrong one. That's the job of the optimizer and of
OpExecutor.
What does your storage layer look like?
I have my own implementation in that function (overrides base class
OpExecutor), which it'll make call to our back-end storage.
From qparse output (attached below), it looks like the culprit is
because the query has BGPs inside the FILTER, which explains the
behavior I am seeing.
Possibly - there are several points where costs may arise.
> ?o rdf:type ?type.
FILTER NOT EXISTS
{
{ ?o rdf:type v:Dynamic }
UNION
{ ?o rdf:type v:Static }
}
FILTER NOT EXISTS {} can usually be written as MINUS or in this case a
expression FILTER on ?type as you have already fetched the rdf:type.
FILTER ( ?o != v:Dynamic && ?o != v:Static )
The (sequence) is flowing results one-by-one into the nest step.
Depending on the storage, it may be better to switch that rewrite off
and use the hash-join built in - or do your own (parallel hash join maybe?)
Do you implement solving BGPs in your store and not relying on the
iterative solver that is used by default?
Is there a better way to re-write the query below to achieve same
result, but more efficient (and lead to better performance)?
If you could give some details of the store it would help. It's hard to
make many suggestions because it is all about the details.
Andy
Thanks,
Z