Re: SPARQL performance question

Steve Vestal Tue, 25 Feb 2020 09:38:40 -0800

I'm currently using 3.8.0 jars.


On 2/25/2020 11:30 AM, Andy Seaborne wrote:
>
>
> On 25/02/2020 16:25, Steve Vestal wrote:
>> I read that chapter in DuCharme's book and have some things to try, such
>> as moving the rdf:type triples around and fragmenting that single filter
>> into pieces distributed throughout the query, and just doing my own
>> post-processing to get disjoint variables.  I'll report back when time
>> permits.
>>
>> My reading did raise the question of what ARQ does for optimization,
>> which the book suggested can vary quite a bit between different SPARQL
>> engines.   I took an admittedly very hasty peek at some sections of the
>> online ARQ documentation, and it mentions optimization in a number of
>> places, but is there a tutorial overview on do's and don'ts when
>> formulating the queries?  A specific question is, will user ordering of
>> triples have a significant effect and should always be considered
>> because that's the order in which search will be done, or is the
>> optimizer going to do its own reordering regardless?  Your suggestion
>> implies the former.
>
> ARQ does do some reordering but the issue here is made complicated by
> the fact that filter placement and reordering interact.
>
> Putting in {} sometime helps as well because
>
> { triple patterns FILTERs }
> { triple patterns FILTERs }
>
> is actually a different query and can push the optimizer to make
> better choices.
>
> Optimization is a lot of "it depends".
>
> (BTW which version are you running?)
>
>     Andy
>
>>
>> On 2/25/2020 8:54 AM, Andy Seaborne wrote:
>>> It might be worth reordering the tripe patterns and/or putting in some
>>> clustering: there is a large amount of cross product being done which
>>> means many,many unwanted or duplicate pieces of work.
>>>
>>> Fore example, move the rdf:type to the end (do you need them at all?)
>>>
>>>      Andy
>>>
>>> (Replaced long URIs for email:)
>>>
>>> ?leftA    <#simplexConnectTo>  ?connectionAA .
>>> ?connectionAA <#simplexConnectTo>  ?rightA .
>>>
>>> ?leftA    <#simplexConnectTo>  ?connectionAB .
>>> ?connectionAB <#simplexConnectTo>  ?rightB .
>>>
>>> ?leftB    <#simplexConnectTo>  ?connectionBA .
>>> ?connectionBA <#simplexConnectTo>  ?rightA .
>>>
>>> ?leftB    <#simplexConnectTo>  ?connectionBB .
>>> ?connectionBB <#simplexConnectTo>  ?rightB .
>>>
>>> ?connectionAA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
>>> ?connectionBA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
>>>
>>> ?connectionAA rdf:type <#portConnection> .
>>> ?connectionAB rdf:type <#portConnection> .
>>> ?connectionBA rdf:type <#portConnection> .
>>> ?connectionBB rdf:type <#portConnection> .
>>>
>>> ?leftA    rdf:type              <#thread> .
>>> ?leftB    rdf:type              <#thread> .
>>> ?rightA   rdf:type              <#thread> .
>>> ?rightB   rdf:type              <#thread> .
>>> ?singleHardware rdf:type              <#platform> .
>>>
>>>
>>>
>>>
>>>
>>> On 24/02/2020 10:01, Rob Vesse wrote:
>>>> To add to what else has been said
>>>>
>>>> Query execution in Apache Jena ARQ is based upon lazy evaluation
>>>> wherever possible.  Calling execSelect() simply prepares a ResultSet
>>>> that is capable of delivering the results but doesn't actually
>>>> evaluate the query and produce any results until you call
>>>> hasNext()/next().  When you call either of these methods then ARQ
>>>> does the minimum amount of work to return the next result (or batch
>>>> of results) depending on the underlying algebra of the query.
>>>>
>>>> Rob
>>>>
>>>> On 23/02/2020, 18:58, "Steve Vestal"
>>>> <[email protected]> wrote:
>>>>
>>>>       I'm looking for suggestions on a SPARQL performance issue. 
>>>> My test
>>>>       model has ~800 sentences, and processing of one select query
>>>> takes about
>>>>       25 minutes.  The query is a basic graph pattern with 9 variables
>>>> and 20
>>>>       triples, plus a filter that forces distinct variables to have
>>>> distinct
>>>>       solutions using pair-wise not-equals constraints.  No option
>>>> clause or
>>>>       anything else fancy.
>>>>            I am issuing the query against an inference model.  Most of
>>>> the asserted
>>>>       sentences are in imported models.  If I iterate over all the
>>>> statements
>>>>       in the OntModel, I get ~1500 almost instantly.  I
>>>> experimented with
>>>>       several of the reasoners.
>>>>            Below is the basic control flow.  The thing I found curious
>>>> is that the
>>>>       execSelect() method finishes almost instantly.  It is the
>>>> iteration over
>>>>       the ResultSet that is taking all the time, it seems in the
>>>> call to
>>>>       selectResult.hasNext(). The result has 192 rows, 9 columns.  The
>>>> results
>>>>       are provided in bursts of 8 rows each, with ~1 minute between
>>>> bursts.
>>>>                    OntModel ontologyModel = getMyOntModel(); // Tried
>>>> various reasoners
>>>>               String selectQuery = getMySelectQuery();
>>>>               QueryExecution selectExec =
>>>>       QueryExecutionFactory.create(selectQuery, ontologyModel);
>>>>               ResultSet selectResult = selectExec.execSelect();
>>>>               while (selectResult.hasNext()) {  // Time seems to be
>>>> spent in
>>>>       hasNext
>>>>                   QuerySolution selectSolution = selectResult.next();
>>>>                   for (String var : getMyVariablesOfInterest() {
>>>>                       RDFNode varValue = selectSolution.get(var);
>>>>                       // process varValue
>>>>                   }
>>>>               }
>>>>            Any suggestions would be appreciated.
>>>>          
>>>>
>>>>

Re: SPARQL performance question

Reply via email to