On 15/10/15 11:39, Giorgio Stefanoni wrote:
Hello,
I recently read the paper ‘SPARQL Basic Graph Pattern Optimisation Using
Selectivity Estimation’ by Markus Stocker et al. from WWW 2008. This paper
proposes a new method based on graph statistics and heuristics for estimating
the result size of a simple SPARQL query (simple = conjunction of BGPs). As far
as I know, this approach was implemented and was part of ARQ.
Markus used and implemented his work based on ARQ; it is not in the
distribution.
I have a couple of questions regarding static query optimisation in JENA/ARQ:
Does JENA/ARQ query optimiser still follow the approach by Markus Stocker et
al? If no, is there a place where I can read about query optimisation in
JENA/ARQ?
Given a SPARQL query q, is it possible to obtain the estimated size of the result set
of q? I tried following this example
(https://jena.apache.org/documentation/query/explain.html
<https://jena.apache.org/documentation/query/explain.html>), however, the
result of explaining the query does not include the cardinality estimation.
Best regards,
Giorgio
Query optimization happens in two stages:
1/ Rewrite the algebra to better algebra (called the "high level") such
as filter placement. See Algebra.optimize
2/ Reordering basic graph patterns is done as the query executes. You
will see the effect as the query executes. ReorderWeighted, ReorderFixed.
https://jena.apache.org/documentation/tdb/optimizer.html
The "fixed" style is applied in-memory as well. In practice, the fixed
algorithm does a reliable and fairly good job in the majority of cases.
Stats based optimization is only really needed when that fails.
Optimization does not try to guess the size of the result set. It tries
to find a faster way to execute the query mainly by replacing, fairly
conservatively, one way of executing with another.
Andy