Re: SPARQL performance question

Dave Reynolds Mon, 24 Feb 2020 01:02:57 -0800

On 23/02/2020 23:11, Steve Vestal wrote:

If I comment out the FILTER clause that prevents variable aliasing, the
query is processed almost immediately.  The number of rows goes from 192

to 576, but it's fast.

Interesting. That does suggest it might actually be Sparql rather thaninference that's the bottleneck. The materialization experiment will bea test of that.

Though looking at your query I wonder if you need inference at all - wecan't see your data to be sure since the list doesn't allow attachments.Have you tried without any inference? Do you know what inference you arerelying on?

What is the proper way to write a query when you
want a particular set of variables to have distinct solution values?

Not sure there is a better way in general. However, I wonder if you canpartition your query into subgroups, filter within the groups then do asimpler join on the results. That might reduce the combinatorics.

However, I don't understand your query nor the modelling (especiallyaround simplexConnect, which looks odd) so might be wrong about that.

I speculated that when I iterated over the statements in the OntModel,
and the number went from a model size() of ~1500 to ~4700 iterated
statements, that I was materializing the entire inference closure (which
was fast).  Is there some other set of calls needed to do that?

The jena inference engines supports a mix of forward and backwardinference rules. The forward inference rules will run once and store allthe results. That's the growth you are probably seeing. That's thenefficient to query.

The backward rules are run on-demand. They generally (this iscontrollable) cache the results of the particular triple patterns thatare requested. Because they only cache against the specific patterns("goals") they see then, depending on what order the goals come in, youcan get cases where there's redundancy in those caches. Those cachesaren't particularly well indexed either. You can certainly query one wayand fill up one set of caches but then a different query asks fordifferent patterns and more rules still need to fire.

*If* multiple overlapping caches in the backward rules is the issue*then* materializing everything and not using inference after that canhelp. It's a balance of whether you are going to query for most of thedata or just do a bunch of point probes. In the former case it's betterto work everything out once. In the latter case better to use on demandrules.


Your query pattern looks like it's going to touch everything.

Are there circumstances where it is faster to materialize the entire
closure and query a plain model than to query the inference model itself?


Yes, see earlier message, and above.

Dave

On 2/23/2020 3:33 PM, Dave Reynolds wrote:

The issues is not performance of SPARQL but performance of the
inference engines.

If you need some OWL inference then your best bet is OWLMicro.

If that's tow slow to query directly then one option to try is to
materialize the entire inference closure and then query that. You can
that by simply copying the inference model to a plain model.

If that's too slow then you'll need a higher performance third party
reasoner.

Dave

On 23/02/2020 18:57, Steve Vestal wrote:

I'm looking for suggestions on a SPARQL performance issue.  My test
model has ~800 sentences, and processing of one select query takes about
25 minutes.  The query is a basic graph pattern with 9 variables and 20
triples, plus a filter that forces distinct variables to have distinct
solutions using pair-wise not-equals constraints.  No option clause or
anything else fancy.

I am issuing the query against an inference model.  Most of the asserted
sentences are in imported models.  If I iterate over all the statements
in the OntModel, I get ~1500 almost instantly.  I experimented with
several of the reasoners.

Below is the basic control flow.  The thing I found curious is that the
execSelect() method finishes almost instantly.  It is the iteration over
the ResultSet that is taking all the time, it seems in the call to
selectResult.hasNext(). The result has 192 rows, 9 columns.  The results
are provided in bursts of 8 rows each, with ~1 minute between bursts.

          OntModel ontologyModel = getMyOntModel(); // Tried various
reasoners
          String selectQuery = getMySelectQuery();
          QueryExecution selectExec =
QueryExecutionFactory.create(selectQuery, ontologyModel);
          ResultSet selectResult = selectExec.execSelect();
          while (selectResult.hasNext()) {  // Time seems to be spent in
hasNext
              QuerySolution selectSolution = selectResult.next();
              for (String var : getMyVariablesOfInterest() {
                  RDFNode varValue = selectSolution.get(var);
                  // process varValue
              }
          }

Any suggestions would be appreciated.

Re: SPARQL performance question

Reply via email to