I read that chapter in DuCharme's book and have some things to try, such as moving the rdf:type triples around and fragmenting that single filter into pieces distributed throughout the query, and just doing my own post-processing to get disjoint variables. I'll report back when time permits.
My reading did raise the question of what ARQ does for optimization, which the book suggested can vary quite a bit between different SPARQL engines. I took an admittedly very hasty peek at some sections of the online ARQ documentation, and it mentions optimization in a number of places, but is there a tutorial overview on do's and don'ts when formulating the queries? A specific question is, will user ordering of triples have a significant effect and should always be considered because that's the order in which search will be done, or is the optimizer going to do its own reordering regardless? Your suggestion implies the former. On 2/25/2020 8:54 AM, Andy Seaborne wrote: > It might be worth reordering the tripe patterns and/or putting in some > clustering: there is a large amount of cross product being done which > means many,many unwanted or duplicate pieces of work. > > Fore example, move the rdf:type to the end (do you need them at all?) > > Andy > > (Replaced long URIs for email:) > > ?leftA <#simplexConnectTo> ?connectionAA . > ?connectionAA <#simplexConnectTo> ?rightA . > > ?leftA <#simplexConnectTo> ?connectionAB . > ?connectionAB <#simplexConnectTo> ?rightB . > > ?leftB <#simplexConnectTo> ?connectionBA . > ?connectionBA <#simplexConnectTo> ?rightA . > > ?leftB <#simplexConnectTo> ?connectionBB . > ?connectionBB <#simplexConnectTo> ?rightB . > > ?connectionAA <fhowl/singlepointfailpattern#boundTo> ?singleHardware . > ?connectionBA <fhowl/singlepointfailpattern#boundTo> ?singleHardware . > > ?connectionAA rdf:type <#portConnection> . > ?connectionAB rdf:type <#portConnection> . > ?connectionBA rdf:type <#portConnection> . > ?connectionBB rdf:type <#portConnection> . > > ?leftA rdf:type <#thread> . > ?leftB rdf:type <#thread> . > ?rightA rdf:type <#thread> . > ?rightB rdf:type <#thread> . > ?singleHardware rdf:type <#platform> . > > > > > > On 24/02/2020 10:01, Rob Vesse wrote: >> To add to what else has been said >> >> Query execution in Apache Jena ARQ is based upon lazy evaluation >> wherever possible. Calling execSelect() simply prepares a ResultSet >> that is capable of delivering the results but doesn't actually >> evaluate the query and produce any results until you call >> hasNext()/next(). When you call either of these methods then ARQ >> does the minimum amount of work to return the next result (or batch >> of results) depending on the underlying algebra of the query. >> >> Rob >> >> On 23/02/2020, 18:58, "Steve Vestal" >> <[email protected]> wrote: >> >> I'm looking for suggestions on a SPARQL performance issue. My test >> model has ~800 sentences, and processing of one select query >> takes about >> 25 minutes. The query is a basic graph pattern with 9 variables >> and 20 >> triples, plus a filter that forces distinct variables to have >> distinct >> solutions using pair-wise not-equals constraints. No option >> clause or >> anything else fancy. >> I am issuing the query against an inference model. Most of >> the asserted >> sentences are in imported models. If I iterate over all the >> statements >> in the OntModel, I get ~1500 almost instantly. I experimented with >> several of the reasoners. >> Below is the basic control flow. The thing I found curious >> is that the >> execSelect() method finishes almost instantly. It is the >> iteration over >> the ResultSet that is taking all the time, it seems in the call to >> selectResult.hasNext(). The result has 192 rows, 9 columns. The >> results >> are provided in bursts of 8 rows each, with ~1 minute between >> bursts. >> OntModel ontologyModel = getMyOntModel(); // Tried >> various reasoners >> String selectQuery = getMySelectQuery(); >> QueryExecution selectExec = >> QueryExecutionFactory.create(selectQuery, ontologyModel); >> ResultSet selectResult = selectExec.execSelect(); >> while (selectResult.hasNext()) { // Time seems to be >> spent in >> hasNext >> QuerySolution selectSolution = selectResult.next(); >> for (String var : getMyVariablesOfInterest() { >> RDFNode varValue = selectSolution.get(var); >> // process varValue >> } >> } >> Any suggestions would be appreciated. >> >> >> >>
