Re: SPARQL performance question

Andy Seaborne Tue, 25 Feb 2020 09:30:30 -0800



On 25/02/2020 16:25, Steve Vestal wrote:

I read that chapter in DuCharme's book and have some things to try, such
as moving the rdf:type triples around and fragmenting that single filter
into pieces distributed throughout the query, and just doing my own
post-processing to get disjoint variables.  I'll report back when time
permits.

My reading did raise the question of what ARQ does for optimization,
which the book suggested can vary quite a bit between different SPARQL
engines.   I took an admittedly very hasty peek at some sections of the
online ARQ documentation, and it mentions optimization in a number of
places, but is there a tutorial overview on do's and don'ts when
formulating the queries?  A specific question is, will user ordering of
triples have a significant effect and should always be considered
because that's the order in which search will be done, or is the
optimizer going to do its own reordering regardless?  Your suggestion
implies the former.

ARQ does do some reordering but the issue here is made complicated bythe fact that filter placement and reordering interact.


Putting in {} sometime helps as well because

{ triple patterns FILTERs }
{ triple patterns FILTERs }

is actually a different query and can push the optimizer to make betterchoices.


Optimization is a lot of "it depends".

(BTW which version are you running?)

    Andy


On 2/25/2020 8:54 AM, Andy Seaborne wrote:

It might be worth reordering the tripe patterns and/or putting in some
clustering: there is a large amount of cross product being done which
means many,many unwanted or duplicate pieces of work.

Fore example, move the rdf:type to the end (do you need them at all?)

     Andy

(Replaced long URIs for email:)

?leftA    <#simplexConnectTo>  ?connectionAA .
?connectionAA <#simplexConnectTo>  ?rightA .

?leftA    <#simplexConnectTo>  ?connectionAB .
?connectionAB <#simplexConnectTo>  ?rightB .

?leftB    <#simplexConnectTo>  ?connectionBA .
?connectionBA <#simplexConnectTo>  ?rightA .

?leftB    <#simplexConnectTo>  ?connectionBB .
?connectionBB <#simplexConnectTo>  ?rightB .

?connectionAA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
?connectionBA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .

?connectionAA rdf:type <#portConnection> .
?connectionAB rdf:type <#portConnection> .
?connectionBA rdf:type <#portConnection> .
?connectionBB rdf:type <#portConnection> .

?leftA    rdf:type              <#thread> .
?leftB    rdf:type              <#thread> .
?rightA   rdf:type              <#thread> .
?rightB   rdf:type              <#thread> .
?singleHardware rdf:type              <#platform> .





On 24/02/2020 10:01, Rob Vesse wrote:

To add to what else has been said

Query execution in Apache Jena ARQ is based upon lazy evaluation
wherever possible.  Calling execSelect() simply prepares a ResultSet
that is capable of delivering the results but doesn't actually
evaluate the query and produce any results until you call
hasNext()/next().  When you call either of these methods then ARQ
does the minimum amount of work to return the next result (or batch
of results) depending on the underlying algebra of the query.

Rob

On 23/02/2020, 18:58, "Steve Vestal"
<[email protected]> wrote:

      I'm looking for suggestions on a SPARQL performance issue.  My test
      model has ~800 sentences, and processing of one select query
takes about
      25 minutes.  The query is a basic graph pattern with 9 variables
and 20
      triples, plus a filter that forces distinct variables to have
distinct
      solutions using pair-wise not-equals constraints.  No option
clause or
      anything else fancy.
           I am issuing the query against an inference model.  Most of
the asserted
      sentences are in imported models.  If I iterate over all the
statements
      in the OntModel, I get ~1500 almost instantly.  I experimented with
      several of the reasoners.
           Below is the basic control flow.  The thing I found curious
is that the
      execSelect() method finishes almost instantly.  It is the
iteration over
      the ResultSet that is taking all the time, it seems in the call to
      selectResult.hasNext(). The result has 192 rows, 9 columns.  The
results
      are provided in bursts of 8 rows each, with ~1 minute between
bursts.
                   OntModel ontologyModel = getMyOntModel(); // Tried
various reasoners
              String selectQuery = getMySelectQuery();
              QueryExecution selectExec =
      QueryExecutionFactory.create(selectQuery, ontologyModel);
              ResultSet selectResult = selectExec.execSelect();
              while (selectResult.hasNext()) {  // Time seems to be
spent in
      hasNext
                  QuerySolution selectSolution = selectResult.next();
                  for (String var : getMyVariablesOfInterest() {
                      RDFNode varValue = selectSolution.get(var);
                      // process varValue
                  }
              }
           Any suggestions would be appreciated.

Re: SPARQL performance question

Reply via email to