Re: Suggestions for learning more about SPARQL query performance?

Andy Seaborne Mon, 05 Apr 2021 06:34:46 -0700

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or aDataset.What is the difference?Which one does ARQ actually operateon?Will ARQ create a Dataset and DataGraph if given (say) an in-memoryModel to be queried?

All SPARQL execution is on a dataset (actually a DatasetGraph, lowerlevel than Dataset which maps to Models, Resources from the buildingblocks of Graphs and DatasetGraph).

Model is a presentation API - it has no state and the RDF is stored inGraphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutorhas both triple/basic graph pattern and quad/quadpattern steps. TDBrewrite the algebra to quad form then executes that.

So actually touching the data comes down to "execute OpQuadPattern" or"execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; it hasits own subclass of OpExecutor. (TDB also has an adapter to execute overa single graph as well.)

If executing in triple form for an in-memory, general purpose dataset, acollection of graphs. The "execute OpBGP" step calls a rgisteredStageGenerator, which is an interface and can be set per-dataset.

StageGenerator receives a multi-triple BGP and returns the solutions -get to see the whole BGP.

(Currently so does TIM, the native in-memory DatasetGraph even though itis a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP executioninto graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of theexecution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full SPARQLfunctionality is available uniformly, but maybe not as efficiently aspossible.

When/where are indexes created and used?

Indexes are a feature of the storage, not the general execution strategy.

optimizing the SPARQL algebra

Optimziation has two parts: rewriting the algebra into "better" algebra.This may include moving filters about.

Reordering BPs/QuadPatterns is done at the Storage or the defaultStageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:

Thanks for the pointers to the excellent tutorial materials on howSPARQL queries are processed and how various things affectperformance.After a little bit of digging and experimenting with ARQ,I’m not clear on how things go from an optimized SPARQL algebraexpression to evaluation of leaf BGPs and when/where indexes arecreated and used.
I assumed ARQ uses the Graph interface to access an underlying Model,which can be backed by any of an in-memory model, a TDB, my own classthat implements the Graph interface, or an inference or union modelbacked by any of these.The Graph interface does not have a “find”method that accepts a multi-triple BGP as an input (as one of thetutorials described), just finds for single-triple patterns.
QueryExecutionFactory lets me create a query on either a Model or aDataset.What is the difference?Which one does ARQ actually operateon?Will ARQ create a Dataset and DataGraph if given (say) an in-memoryModel to be queried?Graph, Dataset, and DatasetGraph all support onlysingle-triple query patterns.The TDB documentation talks aboutoptimizing the SPARQL algebra, but it is the ARQ API that hasoptimization configuration options.Some initial experiments with acouple of ARQ Context settings resulted in little impact on a testquery issued to a series of increasingly larger in-memorymodels.When/where are indexes created and used?
Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulatingqueries, particularly those with non-trivial graph structure, e.g.,more than just a shallow tree of properties rooted in one resource,maybe dags, maybe with cycles. I am open to post-processing queryresults. (I do that already, generating and post-processing queriesare steps in the overall algorithm.)
On 3/18/2021 9:19 AM, Andy Seaborne wrote:
On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL query performance, betterunderstand how different query formulations affect that, how ARQconfiguration parameters might be used to tune that. Can anyonerecommend a place to start reading beyond the SPARQL book andlanguage definition?
Hi Steve,

It's a bit "it depends on the query.
There was a presentation recently and while its not about ARQ, thefundamental point that getting the basic graph pattern matchingworking efficiently applies.
http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov
Do you have specific queries in mind or is this a general enquiry?

    Andy

Re: Suggestions for learning more about SPARQL query performance?

Reply via email to