Re: SPARQL performance question

Steve Vestal Tue, 25 Feb 2020 08:26:32 -0800

I read that chapter in DuCharme's book and have some things to try, such
as moving the rdf:type triples around and fragmenting that single filter
into pieces distributed throughout the query, and just doing my own
post-processing to get disjoint variables.  I'll report back when time
permits.


My reading did raise the question of what ARQ does for optimization,
which the book suggested can vary quite a bit between different SPARQL
engines.   I took an admittedly very hasty peek at some sections of the
online ARQ documentation, and it mentions optimization in a number of
places, but is there a tutorial overview on do's and don'ts when
formulating the queries?  A specific question is, will user ordering of
triples have a significant effect and should always be considered
because that's the order in which search will be done, or is the
optimizer going to do its own reordering regardless?  Your suggestion
implies the former.

On 2/25/2020 8:54 AM, Andy Seaborne wrote:
> It might be worth reordering the tripe patterns and/or putting in some
> clustering: there is a large amount of cross product being done which
> means many,many unwanted or duplicate pieces of work.
>
> Fore example, move the rdf:type to the end (do you need them at all?)
>
>     Andy
>
> (Replaced long URIs for email:)
>
> ?leftA    <#simplexConnectTo>  ?connectionAA .
> ?connectionAA <#simplexConnectTo>  ?rightA .
>
> ?leftA    <#simplexConnectTo>  ?connectionAB .
> ?connectionAB <#simplexConnectTo>  ?rightB .
>
> ?leftB    <#simplexConnectTo>  ?connectionBA .
> ?connectionBA <#simplexConnectTo>  ?rightA .
>
> ?leftB    <#simplexConnectTo>  ?connectionBB .
> ?connectionBB <#simplexConnectTo>  ?rightB .
>
> ?connectionAA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
> ?connectionBA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
>
> ?connectionAA rdf:type <#portConnection> .
> ?connectionAB rdf:type <#portConnection> .
> ?connectionBA rdf:type <#portConnection> .
> ?connectionBB rdf:type <#portConnection> .
>
> ?leftA    rdf:type              <#thread> .
> ?leftB    rdf:type              <#thread> .
> ?rightA   rdf:type              <#thread> .
> ?rightB   rdf:type              <#thread> .
> ?singleHardware rdf:type              <#platform> .
>
>
>
>
>
> On 24/02/2020 10:01, Rob Vesse wrote:
>> To add to what else has been said
>>
>> Query execution in Apache Jena ARQ is based upon lazy evaluation
>> wherever possible.  Calling execSelect() simply prepares a ResultSet
>> that is capable of delivering the results but doesn't actually
>> evaluate the query and produce any results until you call
>> hasNext()/next().  When you call either of these methods then ARQ
>> does the minimum amount of work to return the next result (or batch
>> of results) depending on the underlying algebra of the query.
>>
>> Rob
>>
>> On 23/02/2020, 18:58, "Steve Vestal"
>> <[email protected]> wrote:
>>
>>      I'm looking for suggestions on a SPARQL performance issue.  My test
>>      model has ~800 sentences, and processing of one select query
>> takes about
>>      25 minutes.  The query is a basic graph pattern with 9 variables
>> and 20
>>      triples, plus a filter that forces distinct variables to have
>> distinct
>>      solutions using pair-wise not-equals constraints.  No option
>> clause or
>>      anything else fancy.
>>           I am issuing the query against an inference model.  Most of
>> the asserted
>>      sentences are in imported models.  If I iterate over all the
>> statements
>>      in the OntModel, I get ~1500 almost instantly.  I experimented with
>>      several of the reasoners.
>>           Below is the basic control flow.  The thing I found curious
>> is that the
>>      execSelect() method finishes almost instantly.  It is the
>> iteration over
>>      the ResultSet that is taking all the time, it seems in the call to
>>      selectResult.hasNext(). The result has 192 rows, 9 columns.  The
>> results
>>      are provided in bursts of 8 rows each, with ~1 minute between
>> bursts.
>>                   OntModel ontologyModel = getMyOntModel(); // Tried
>> various reasoners
>>              String selectQuery = getMySelectQuery();
>>              QueryExecution selectExec =
>>      QueryExecutionFactory.create(selectQuery, ontologyModel);
>>              ResultSet selectResult = selectExec.execSelect();
>>              while (selectResult.hasNext()) {  // Time seems to be
>> spent in
>>      hasNext
>>                  QuerySolution selectSolution = selectResult.next();
>>                  for (String var : getMyVariablesOfInterest() {
>>                      RDFNode varValue = selectSolution.get(var);
>>                      // process varValue
>>                  }
>>              }
>>           Any suggestions would be appreciated.
>>          
>>
>>
>>

Re: SPARQL performance question

Reply via email to