On 25/02/2020 16:25, Steve Vestal wrote:
I read that chapter in DuCharme's book and have some things to try, such
as moving the rdf:type triples around and fragmenting that single filter
into pieces distributed throughout the query, and just doing my own
post-processing to get disjoint variables. I'll report back when time
permits.
My reading did raise the question of what ARQ does for optimization,
which the book suggested can vary quite a bit between different SPARQL
engines. I took an admittedly very hasty peek at some sections of the
online ARQ documentation, and it mentions optimization in a number of
places, but is there a tutorial overview on do's and don'ts when
formulating the queries? A specific question is, will user ordering of
triples have a significant effect and should always be considered
because that's the order in which search will be done, or is the
optimizer going to do its own reordering regardless? Your suggestion
implies the former.
ARQ does do some reordering but the issue here is made complicated by
the fact that filter placement and reordering interact.
Putting in {} sometime helps as well because
{ triple patterns FILTERs }
{ triple patterns FILTERs }
is actually a different query and can push the optimizer to make better
choices.
Optimization is a lot of "it depends".
(BTW which version are you running?)
Andy
On 2/25/2020 8:54 AM, Andy Seaborne wrote:
It might be worth reordering the tripe patterns and/or putting in some
clustering: there is a large amount of cross product being done which
means many,many unwanted or duplicate pieces of work.
Fore example, move the rdf:type to the end (do you need them at all?)
Andy
(Replaced long URIs for email:)
?leftA <#simplexConnectTo> ?connectionAA .
?connectionAA <#simplexConnectTo> ?rightA .
?leftA <#simplexConnectTo> ?connectionAB .
?connectionAB <#simplexConnectTo> ?rightB .
?leftB <#simplexConnectTo> ?connectionBA .
?connectionBA <#simplexConnectTo> ?rightA .
?leftB <#simplexConnectTo> ?connectionBB .
?connectionBB <#simplexConnectTo> ?rightB .
?connectionAA <fhowl/singlepointfailpattern#boundTo> ?singleHardware .
?connectionBA <fhowl/singlepointfailpattern#boundTo> ?singleHardware .
?connectionAA rdf:type <#portConnection> .
?connectionAB rdf:type <#portConnection> .
?connectionBA rdf:type <#portConnection> .
?connectionBB rdf:type <#portConnection> .
?leftA rdf:type <#thread> .
?leftB rdf:type <#thread> .
?rightA rdf:type <#thread> .
?rightB rdf:type <#thread> .
?singleHardware rdf:type <#platform> .
On 24/02/2020 10:01, Rob Vesse wrote:
To add to what else has been said
Query execution in Apache Jena ARQ is based upon lazy evaluation
wherever possible. Calling execSelect() simply prepares a ResultSet
that is capable of delivering the results but doesn't actually
evaluate the query and produce any results until you call
hasNext()/next(). When you call either of these methods then ARQ
does the minimum amount of work to return the next result (or batch
of results) depending on the underlying algebra of the query.
Rob
On 23/02/2020, 18:58, "Steve Vestal"
<[email protected]> wrote:
I'm looking for suggestions on a SPARQL performance issue. My test
model has ~800 sentences, and processing of one select query
takes about
25 minutes. The query is a basic graph pattern with 9 variables
and 20
triples, plus a filter that forces distinct variables to have
distinct
solutions using pair-wise not-equals constraints. No option
clause or
anything else fancy.
I am issuing the query against an inference model. Most of
the asserted
sentences are in imported models. If I iterate over all the
statements
in the OntModel, I get ~1500 almost instantly. I experimented with
several of the reasoners.
Below is the basic control flow. The thing I found curious
is that the
execSelect() method finishes almost instantly. It is the
iteration over
the ResultSet that is taking all the time, it seems in the call to
selectResult.hasNext(). The result has 192 rows, 9 columns. The
results
are provided in bursts of 8 rows each, with ~1 minute between
bursts.
OntModel ontologyModel = getMyOntModel(); // Tried
various reasoners
String selectQuery = getMySelectQuery();
QueryExecution selectExec =
QueryExecutionFactory.create(selectQuery, ontologyModel);
ResultSet selectResult = selectExec.execSelect();
while (selectResult.hasNext()) { // Time seems to be
spent in
hasNext
QuerySolution selectSolution = selectResult.next();
for (String var : getMyVariablesOfInterest() {
RDFNode varValue = selectSolution.get(var);
// process varValue
}
}
Any suggestions would be appreciated.