I'm currently using 3.8.0 jars.
On 2/25/2020 11:30 AM, Andy Seaborne wrote:
>
>
> On 25/02/2020 16:25, Steve Vestal wrote:
>> I read that chapter in DuCharme's book and have some things to try, such
>> as moving the rdf:type triples around and fragmenting that single filter
>> into pieces distributed throughout the query, and just doing my own
>> post-processing to get disjoint variables. I'll report back when time
>> permits.
>>
>> My reading did raise the question of what ARQ does for optimization,
>> which the book suggested can vary quite a bit between different SPARQL
>> engines. I took an admittedly very hasty peek at some sections of the
>> online ARQ documentation, and it mentions optimization in a number of
>> places, but is there a tutorial overview on do's and don'ts when
>> formulating the queries? A specific question is, will user ordering of
>> triples have a significant effect and should always be considered
>> because that's the order in which search will be done, or is the
>> optimizer going to do its own reordering regardless? Your suggestion
>> implies the former.
>
> ARQ does do some reordering but the issue here is made complicated by
> the fact that filter placement and reordering interact.
>
> Putting in {} sometime helps as well because
>
> { triple patterns FILTERs }
> { triple patterns FILTERs }
>
> is actually a different query and can push the optimizer to make
> better choices.
>
> Optimization is a lot of "it depends".
>
> (BTW which version are you running?)
>
> Andy
>
>>
>> On 2/25/2020 8:54 AM, Andy Seaborne wrote:
>>> It might be worth reordering the tripe patterns and/or putting in some
>>> clustering: there is a large amount of cross product being done which
>>> means many,many unwanted or duplicate pieces of work.
>>>
>>> Fore example, move the rdf:type to the end (do you need them at all?)
>>>
>>> Andy
>>>
>>> (Replaced long URIs for email:)
>>>
>>> ?leftA <#simplexConnectTo> ?connectionAA .
>>> ?connectionAA <#simplexConnectTo> ?rightA .
>>>
>>> ?leftA <#simplexConnectTo> ?connectionAB .
>>> ?connectionAB <#simplexConnectTo> ?rightB .
>>>
>>> ?leftB <#simplexConnectTo> ?connectionBA .
>>> ?connectionBA <#simplexConnectTo> ?rightA .
>>>
>>> ?leftB <#simplexConnectTo> ?connectionBB .
>>> ?connectionBB <#simplexConnectTo> ?rightB .
>>>
>>> ?connectionAA <fhowl/singlepointfailpattern#boundTo> ?singleHardware .
>>> ?connectionBA <fhowl/singlepointfailpattern#boundTo> ?singleHardware .
>>>
>>> ?connectionAA rdf:type <#portConnection> .
>>> ?connectionAB rdf:type <#portConnection> .
>>> ?connectionBA rdf:type <#portConnection> .
>>> ?connectionBB rdf:type <#portConnection> .
>>>
>>> ?leftA rdf:type <#thread> .
>>> ?leftB rdf:type <#thread> .
>>> ?rightA rdf:type <#thread> .
>>> ?rightB rdf:type <#thread> .
>>> ?singleHardware rdf:type <#platform> .
>>>
>>>
>>>
>>>
>>>
>>> On 24/02/2020 10:01, Rob Vesse wrote:
>>>> To add to what else has been said
>>>>
>>>> Query execution in Apache Jena ARQ is based upon lazy evaluation
>>>> wherever possible. Calling execSelect() simply prepares a ResultSet
>>>> that is capable of delivering the results but doesn't actually
>>>> evaluate the query and produce any results until you call
>>>> hasNext()/next(). When you call either of these methods then ARQ
>>>> does the minimum amount of work to return the next result (or batch
>>>> of results) depending on the underlying algebra of the query.
>>>>
>>>> Rob
>>>>
>>>> On 23/02/2020, 18:58, "Steve Vestal"
>>>> <[email protected]> wrote:
>>>>
>>>> I'm looking for suggestions on a SPARQL performance issue.
>>>> My test
>>>> model has ~800 sentences, and processing of one select query
>>>> takes about
>>>> 25 minutes. The query is a basic graph pattern with 9 variables
>>>> and 20
>>>> triples, plus a filter that forces distinct variables to have
>>>> distinct
>>>> solutions using pair-wise not-equals constraints. No option
>>>> clause or
>>>> anything else fancy.
>>>> I am issuing the query against an inference model. Most of
>>>> the asserted
>>>> sentences are in imported models. If I iterate over all the
>>>> statements
>>>> in the OntModel, I get ~1500 almost instantly. I
>>>> experimented with
>>>> several of the reasoners.
>>>> Below is the basic control flow. The thing I found curious
>>>> is that the
>>>> execSelect() method finishes almost instantly. It is the
>>>> iteration over
>>>> the ResultSet that is taking all the time, it seems in the
>>>> call to
>>>> selectResult.hasNext(). The result has 192 rows, 9 columns. The
>>>> results
>>>> are provided in bursts of 8 rows each, with ~1 minute between
>>>> bursts.
>>>> OntModel ontologyModel = getMyOntModel(); // Tried
>>>> various reasoners
>>>> String selectQuery = getMySelectQuery();
>>>> QueryExecution selectExec =
>>>> QueryExecutionFactory.create(selectQuery, ontologyModel);
>>>> ResultSet selectResult = selectExec.execSelect();
>>>> while (selectResult.hasNext()) { // Time seems to be
>>>> spent in
>>>> hasNext
>>>> QuerySolution selectSolution = selectResult.next();
>>>> for (String var : getMyVariablesOfInterest() {
>>>> RDFNode varValue = selectSolution.get(var);
>>>> // process varValue
>>>> }
>>>> }
>>>> Any suggestions would be appreciated.
>>>>
>>>>
>>>>