Re: SPARQL performance question

Steve Vestal Mon, 24 Feb 2020 09:34:00 -0800

With some advice from Dave, I made a copy of the OntModel that hopefully
materialized the full entailment closure:


        Model entailedModel = ModelFactory.createDefaultModel();
        entailedModel.add(ontologyModel);

in less than one second, the results were:

    Statements in ontology model: 1146
    Entailed model org.apache.jena.rdf.model.impl.ModelCom size  4453

I ran the select query on this entailed model.  It still takes about 25
minutes.

I see there is a chapter on Query Efficiency and Debugging in DuCharme's
book. Now seems like a good time for me to read that chapter.

Thanks for all the help.

On 2/24/2020 3:02 AM, Dave Reynolds wrote:
> On 23/02/2020 23:11, Steve Vestal wrote:
>> If I comment out the FILTER clause that prevents variable aliasing, the
>> query is processed almost immediately.  The number of rows goes from 192
>> to 576, but it's fast.  
>
> Interesting. That does suggest it might actually be Sparql rather than
> inference that's the bottleneck. The materialization experiment will
> be a test of that.
>
> Though looking at your query I wonder if you need inference at all -
> we can't see your data to be sure since the list doesn't allow
> attachments.
> Have you tried without any inference? Do you know what inference you
> are relying on?
>
>> What is the proper way to write a query when you
>> want a particular set of variables to have distinct solution values?
>
> Not sure there is a better way in general. However, I wonder if you
> can partition your query into subgroups, filter within the groups then
> do a simpler join on the results. That might reduce the combinatorics.
>
> However, I don't understand your query nor the modelling (especially
> around simplexConnect, which looks odd) so might be wrong about that.
>
>> I speculated that when I iterated over the statements in the OntModel,
>> and the number went from a model size() of ~1500 to ~4700 iterated
>> statements, that I was materializing the entire inference closure (which
>> was fast).  Is there some other set of calls needed to do that?
>
> The jena inference engines supports a mix of forward and backward
> inference rules. The forward inference rules will run once and store
> all the results. That's the growth you are probably seeing. That's
> then efficient to query.
>
> The backward rules are run on-demand. They generally (this is
> controllable) cache the results of the particular triple patterns that
> are requested. Because they only cache against the specific patterns
> ("goals") they see then, depending on what order the goals come in,
> you can get cases where there's redundancy in those caches. Those
> caches aren't particularly well indexed either. You can certainly
> query one way and fill up one set of caches but then a different query
> asks for different patterns and more rules still need to fire.
>
> *If* multiple overlapping caches in the backward rules is the issue
> *then* materializing everything and not using inference after that 
> can help. It's a balance of whether you are going to query for most of
> the data or just do a bunch of point probes. In the former case it's
> better to work everything out once. In the latter case better to use
> on demand rules.
>
> Your query pattern looks like it's going to touch everything.
>
>> Are there circumstances where it is faster to materialize the entire
>> closure and query a plain model than to query the inference model
>> itself?
>
> Yes, see earlier message, and above.
>
> Dave
>
>> On 2/23/2020 3:33 PM, Dave Reynolds wrote:
>>> The issues is not performance of SPARQL but performance of the
>>> inference engines.
>>>
>>> If you need some OWL inference then your best bet is OWLMicro.
>>>
>>> If that's tow slow to query directly then one option to try is to
>>> materialize the entire inference closure and then query that. You can
>>> that by simply copying the inference model to a plain model.
>>>
>>> If that's too slow then you'll need a higher performance third party
>>> reasoner.
>>>
>>> Dave
>>>
>>> On 23/02/2020 18:57, Steve Vestal wrote:
>>>> I'm looking for suggestions on a SPARQL performance issue.  My test
>>>> model has ~800 sentences, and processing of one select query takes
>>>> about
>>>> 25 minutes.  The query is a basic graph pattern with 9 variables
>>>> and 20
>>>> triples, plus a filter that forces distinct variables to have distinct
>>>> solutions using pair-wise not-equals constraints.  No option clause or
>>>> anything else fancy.
>>>>
>>>> I am issuing the query against an inference model.  Most of the
>>>> asserted
>>>> sentences are in imported models.  If I iterate over all the
>>>> statements
>>>> in the OntModel, I get ~1500 almost instantly.  I experimented with
>>>> several of the reasoners.
>>>>
>>>> Below is the basic control flow.  The thing I found curious is that
>>>> the
>>>> execSelect() method finishes almost instantly.  It is the iteration
>>>> over
>>>> the ResultSet that is taking all the time, it seems in the call to
>>>> selectResult.hasNext(). The result has 192 rows, 9 columns.  The
>>>> results
>>>> are provided in bursts of 8 rows each, with ~1 minute between bursts.
>>>>
>>>>           OntModel ontologyModel = getMyOntModel(); // Tried various
>>>> reasoners
>>>>           String selectQuery = getMySelectQuery();
>>>>           QueryExecution selectExec =
>>>> QueryExecutionFactory.create(selectQuery, ontologyModel);
>>>>           ResultSet selectResult = selectExec.execSelect();
>>>>           while (selectResult.hasNext()) {  // Time seems to be
>>>> spent in
>>>> hasNext
>>>>               QuerySolution selectSolution = selectResult.next();
>>>>               for (String var : getMyVariablesOfInterest() {
>>>>                   RDFNode varValue = selectSolution.get(var);
>>>>                   // process varValue
>>>>               }
>>>>           }
>>>>
>>>> Any suggestions would be appreciated.
>>>>
>>

Re: SPARQL performance question

Reply via email to