To answer your question, Andy, == The old query, some names abbreviated: PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl:<http://www.w3.org/2002/07/owl#> SELECT ?connectionAA ?connectionAB ?connectionBA ?connectionBB ?leftA ?leftB ?rightA ?rightB ?singleHardware WHERE { ?connectionAA rdf:type <#portConnection>. ?connectionAB rdf:type <#portConnection>. ?connectionBA rdf:type <#portConnection>. ?connectionBB rdf:type <#portConnection>. ?leftA rdf:type <#thread>. ?leftB rdf:type <#thread>. ?rightA rdf:type <#thread>. ?rightB rdf:type <#thread>. ?singleHardware rdf:type <#platform>. ?leftA <#simplexConnectTo> ?connectionAA. ?connectionAA <#simplexConnectTo> ?rightA. ?leftA <#simplexConnectTo> ?connectionAB. ?connectionAB <#simplexConnectTo> ?rightB. ?leftB <#simplexConnectTo> ?connectionBA. ?connectionBA <#simplexConnectTo> ?rightA. ?leftB <#simplexConnectTo> ?connectionBB. ?connectionBB <#simplexConnectTo> ?rightB. ?connectionAA <#boundTo> ?singleHardware. ?connectionBA <#boundTo> ?singleHardware. FILTER (?connectionAA!=?connectionAB && ?connectionAA!=?connectionBA && ?connectionAA!=?connectionBB && ?connectionAA!=?leftA && ?connectionAA!=?leftB && ?connectionAA!=?rightA && ?connectionAA!=?rightB && ?connectionAA!=?singleHardware && ?connectionAB!=?connectionBA && ?connectionAB!=?connectionBB && ?connectionAB!=?leftA && ?connectionAB!=?leftB && ?connectionAB!=?rightA && ?connectionAB!=?rightB && ?connectionAB!=?singleHardware && ?connectionBA!=?connectionBB && ?connectionBA!=?leftA && ?connectionBA!=?leftB && ?connectionBA!=?rightA && ?connectionBA!=?rightB && ?connectionBA!=?singleHardware && ?connectionBB!=?leftA && ?connectionBB!=?leftB && ?connectionBB!=?rightA && ?connectionBB!=?rightB && ?connectionBB!=?singleHardware && ?leftA!=?leftB && ?leftA!=?rightA && ?leftA!=?rightB && ?leftA!=?singleHardware && ?leftB!=?rightA && ?leftB!=?rightB && ?leftB!=?singleHardware && ?rightA!=?rightB && ?rightA!=?singleHardware && ?rightB!=?singleHardware) }
== The new query: PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl:<http://www.w3.org/2002/07/owl#> SELECT ?connectionAA ?connectionAB ?connectionBA ?connectionBB ?leftA ?leftB ?rightA ?rightB ?singleHardware WHERE { ?leftA rdf:type <#thread>. ?connectionAA rdf:type <#portConnection>. ?leftA <#simplexConnectTo> ?connectionAA. FILTER(!sameTerm(?leftA,?connectionAA)). ?rightA rdf:type <#thread>. ?connectionAA <#simplexConnectTo> ?rightA. FILTER(!sameTerm(?leftA,?rightA) && !sameTerm(?connectionAA,?rightA)). ?connectionAB rdf:type <#portConnection>. ?leftA <#simplexConnectTo> ?connectionAB. FILTER(!sameTerm(?rightA,?connectionAB) && !sameTerm(?leftA,?connectionAB) && !sameTerm(?connectionAA,?connectionAB)). ?rightB rdf:type <#thread>. ?connectionAB <#simplexConnectTo> ?rightB. FILTER(!sameTerm(?rightA,?rightB) && !sameTerm(?leftA,?rightB) && !sameTerm(?connectionAA,?rightB) && !sameTerm(?connectionAB,?rightB)). ?leftB rdf:type <#thread>. ?connectionBA rdf:type <#portConnection>. ?leftB <#simplexConnectTo> ?connectionBA. FILTER(!sameTerm(?rightA,?leftB) && !sameTerm(?rightB,?leftB) && !sameTerm(?leftA,?leftB) && !sameTerm(?connectionAA,?leftB) && !sameTerm(?connectionAB,?leftB) && !sameTerm(?rightA,?connectionBA) && !sameTerm(?rightB,?connectionBA) && !sameTerm(?leftB,?connectionBA) && !sameTerm(?leftA,?connectionBA) && !sameTerm(?connectionAA,?connectionBA) && !sameTerm(?connectionAB,?connectionBA)). ?connectionBA <#simplexConnectTo> ?rightA. ?connectionBB rdf:type <#portConnection>. ?leftB <#simplexConnectTo> ?connectionBB. FILTER(!sameTerm(?rightA,?connectionBB) && !sameTerm(?rightB,?connectionBB) && !sameTerm(?leftB,?connectionBB) && !sameTerm(?leftA,?connectionBB) && !sameTerm(?connectionBA,?connectionBB) && !sameTerm(?connectionAA,?connectionBB) && !sameTerm(?connectionAB,?connectionBB)). ?connectionBB <#simplexConnectTo> ?rightB. ?singleHardware rdf:type <#platform>. ?connectionAA <#boundTo> ?singleHardware. FILTER(!sameTerm(?rightA,?singleHardware) && !sameTerm(?rightB,?singleHardware) && !sameTerm(?leftB,?singleHardware) && !sameTerm(?leftA,?singleHardware) && !sameTerm(?connectionBA,?singleHardware) && !sameTerm(?connectionAA,?singleHardware) && !sameTerm(?connectionBB,?singleHardware) && !sameTerm(?connectionAB,?singleHardware)). ?connectionBA <#boundTo> ?singleHardware. } On 2/26/2020 8:06 AM, Andy Seaborne wrote: > > > On 26/02/2020 11:26, Steve Vestal wrote: >> Reporting back as requested to close this issue. > > Thank you - knowing usage and experiences is always helpful, as is > whether sugegstions did indeed have a useful effect. > >> Recall the original select query took ~25 minutes on a small test case, >> where the query was issued against an OntModel with four imports, trying >> various reasoners since reasoning is necessary to get any results in >> this test. >> >> Number of asserted sentences: 712 >> Number of forward-chained entailments at OntModel creation (making >> assumptions about reasoning and import handling): 1130 >> Size of entailment closure: 4421, which took 136 ms to compute (all >> times wall-clock times on a laptop) >> >> Experiments indicated a bottleneck occurred due to a FILTER at the end, >> a conjunction of many varA!=varB to anti-alias solutions. VisualVM >> profiling indicated over 99% of the time was spent in cycles of >> recursive calls involving >> org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding, >> >> makeNextStep, *.QueryIteratorBase.hasNext, and >> *.QueryIterProcessBinding.hasNextBinding. (I have no idea what if >> anything that means, but in case it is of interest to someone...) > > It does to me. > > These are the methods calls from moving from one intermediate result > to another suggesting there are a lot of intermediate rows being > processed. > >> The first exercise was to omit the anti-aliasing filter from the select >> query itself and post-process the result to ignore solution rows with >> aliased variable solutions. That increased the size of the select >> result from 192 to 576 rows but reduced the time from ~25 minutes to >> ~450 ms. >> >> The initial query listed all rdf:type triples first, triples that >> specified properties between nodes next, and a final big-bang filter at >> the end. The second exercise was to shuffle these triples into an order >> intended to progressively narrow down the search space under the >> assumption triples are processed in they order they are listed in the >> query (as suggested in "Learning SPARQL," noting that Andy's earlier >> post said ARQ does do some reordering for optimization). The original >> filter was fragmented into multiple smaller pieces that were also >> shuffled among the other triples for this exercise. This resulted in a >> time of 111 ms, further reduced to 99 ms by switching from "!=" to >> "!sameTerm" in the filters. > > Good news! > > What is the final query? > >> >> I'm back in the saddle. Thanks again for everyone's help. > > Andy > >> >> On 2/25/2020 12:33 PM, Andy Seaborne wrote: >>> Current is 3.14.0. >>> >>> On 25/02/2020 17:38, Steve Vestal wrote: >>>> I'm currently using 3.8.0 jars. >>>> >>>> On 2/25/2020 11:30 AM, Andy Seaborne wrote: >>>>> >>>>> >>>>> On 25/02/2020 16:25, Steve Vestal wrote: >>>>>> I read that chapter in DuCharme's book and have some things to try, >>>>>> such >>>>>> as moving the rdf:type triples around and fragmenting that single >>>>>> filter >>>>>> into pieces distributed throughout the query, and just doing my own >>>>>> post-processing to get disjoint variables. I'll report back when >>>>>> time >>>>>> permits. >>>>>> >>>>>> My reading did raise the question of what ARQ does for optimization, >>>>>> which the book suggested can vary quite a bit between different >>>>>> SPARQL >>>>>> engines. I took an admittedly very hasty peek at some sections of >>>>>> the >>>>>> online ARQ documentation, and it mentions optimization in a >>>>>> number of >>>>>> places, but is there a tutorial overview on do's and don'ts when >>>>>> formulating the queries? A specific question is, will user >>>>>> ordering of >>>>>> triples have a significant effect and should always be considered >>>>>> because that's the order in which search will be done, or is the >>>>>> optimizer going to do its own reordering regardless? Your >>>>>> suggestion >>>>>> implies the former. >>>>> >>>>> ARQ does do some reordering but the issue here is made complicated by >>>>> the fact that filter placement and reordering interact. >>>>> >>>>> Putting in {} sometime helps as well because >>>>> >>>>> { triple patterns FILTERs } >>>>> { triple patterns FILTERs } >>>>> >>>>> is actually a different query and can push the optimizer to make >>>>> better choices. >>>>> >>>>> Optimization is a lot of "it depends". >>>>> >>>>> (BTW which version are you running?) >>>>> >>>>> Andy >>>>> >>>>>> >>>>>> On 2/25/2020 8:54 AM, Andy Seaborne wrote: >>>>>>> It might be worth reordering the tripe patterns and/or putting in >>>>>>> some >>>>>>> clustering: there is a large amount of cross product being done >>>>>>> which >>>>>>> means many,many unwanted or duplicate pieces of work. >>>>>>> >>>>>>> Fore example, move the rdf:type to the end (do you need them at >>>>>>> all?) >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> (Replaced long URIs for email:) >>>>>>> >>>>>>> ?leftA <#simplexConnectTo> ?connectionAA . >>>>>>> ?connectionAA <#simplexConnectTo> ?rightA . >>>>>>> >>>>>>> ?leftA <#simplexConnectTo> ?connectionAB . >>>>>>> ?connectionAB <#simplexConnectTo> ?rightB . >>>>>>> >>>>>>> ?leftB <#simplexConnectTo> ?connectionBA . >>>>>>> ?connectionBA <#simplexConnectTo> ?rightA . >>>>>>> >>>>>>> ?leftB <#simplexConnectTo> ?connectionBB . >>>>>>> ?connectionBB <#simplexConnectTo> ?rightB . >>>>>>> >>>>>>> ?connectionAA <fhowl/singlepointfailpattern#boundTo> >>>>>>> ?singleHardware . >>>>>>> ?connectionBA <fhowl/singlepointfailpattern#boundTo> >>>>>>> ?singleHardware . >>>>>>> >>>>>>> ?connectionAA rdf:type <#portConnection> . >>>>>>> ?connectionAB rdf:type <#portConnection> . >>>>>>> ?connectionBA rdf:type <#portConnection> . >>>>>>> ?connectionBB rdf:type <#portConnection> . >>>>>>> >>>>>>> ?leftA rdf:type <#thread> . >>>>>>> ?leftB rdf:type <#thread> . >>>>>>> ?rightA rdf:type <#thread> . >>>>>>> ?rightB rdf:type <#thread> . >>>>>>> ?singleHardware rdf:type <#platform> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 24/02/2020 10:01, Rob Vesse wrote: >>>>>>>> To add to what else has been said >>>>>>>> >>>>>>>> Query execution in Apache Jena ARQ is based upon lazy evaluation >>>>>>>> wherever possible. Calling execSelect() simply prepares a >>>>>>>> ResultSet >>>>>>>> that is capable of delivering the results but doesn't actually >>>>>>>> evaluate the query and produce any results until you call >>>>>>>> hasNext()/next(). When you call either of these methods then ARQ >>>>>>>> does the minimum amount of work to return the next result (or >>>>>>>> batch >>>>>>>> of results) depending on the underlying algebra of the query. >>>>>>>> >>>>>>>> Rob >>>>>>>> >>>>>>>> On 23/02/2020, 18:58, "Steve Vestal" >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> I'm looking for suggestions on a SPARQL performance issue. >>>>>>>> My test >>>>>>>> model has ~800 sentences, and processing of one select >>>>>>>> query >>>>>>>> takes about >>>>>>>> 25 minutes. The query is a basic graph pattern with 9 >>>>>>>> variables >>>>>>>> and 20 >>>>>>>> triples, plus a filter that forces distinct variables >>>>>>>> to have >>>>>>>> distinct >>>>>>>> solutions using pair-wise not-equals constraints. No >>>>>>>> option >>>>>>>> clause or >>>>>>>> anything else fancy. >>>>>>>> I am issuing the query against an inference model. >>>>>>>> Most of >>>>>>>> the asserted >>>>>>>> sentences are in imported models. If I iterate over >>>>>>>> all the >>>>>>>> statements >>>>>>>> in the OntModel, I get ~1500 almost instantly. I >>>>>>>> experimented with >>>>>>>> several of the reasoners. >>>>>>>> Below is the basic control flow. The thing I found >>>>>>>> curious >>>>>>>> is that the >>>>>>>> execSelect() method finishes almost instantly. It is the >>>>>>>> iteration over >>>>>>>> the ResultSet that is taking all the time, it seems in the >>>>>>>> call to >>>>>>>> selectResult.hasNext(). The result has 192 rows, 9 >>>>>>>> columns. The >>>>>>>> results >>>>>>>> are provided in bursts of 8 rows each, with ~1 minute >>>>>>>> between >>>>>>>> bursts. >>>>>>>> OntModel ontologyModel = getMyOntModel(); // >>>>>>>> Tried >>>>>>>> various reasoners >>>>>>>> String selectQuery = getMySelectQuery(); >>>>>>>> QueryExecution selectExec = >>>>>>>> QueryExecutionFactory.create(selectQuery, ontologyModel); >>>>>>>> ResultSet selectResult = selectExec.execSelect(); >>>>>>>> while (selectResult.hasNext()) { // Time seems >>>>>>>> to be >>>>>>>> spent in >>>>>>>> hasNext >>>>>>>> QuerySolution selectSolution = >>>>>>>> selectResult.next(); >>>>>>>> for (String var : getMyVariablesOfInterest() { >>>>>>>> RDFNode varValue = >>>>>>>> selectSolution.get(var); >>>>>>>> // process varValue >>>>>>>> } >>>>>>>> } >>>>>>>> Any suggestions would be appreciated. >>>>>>>> >>
