Re: Speed issue while processing query resultSets on various ontology models

Rob Vesse Wed, 14 Mar 2018 11:53:28 -0700

Marc

Yes, displaying the result set is the first operation that actually consumes 
the results and thus actually executes the query hence my point.  When you call 
execSelect() you are just getting a placeholder object that knows how to 
execute the query at some future point in time.  Only when you actual start 
consuming the object does the query execution happen.

The difference in numbers you see on the first call to this is simply JVM 
ClassLoader effects on a cold VM, the first time you call execSelect() a whole 
bunch of classes have to be loaded in order to construct the underlying query 
iterator, subsequent calls will already have all the relevant classes loaded 
and thus be much faster.

The timing that matters is the consumption of the result set however you might 
choose to consume it whether by printing it out or otherwise.

In the case of inference then yes there is also an upfront cost of computing 
the inferences.  Once computed these are typically cached (though this depends 
on the rule set) and any changes to the data might invalidate that cache.  You 
can call prepare() on the InfModel to incur the initial computation cost 
separately, otherwise the initial computation cost is incurred by whatever 
operation first accesses the InfModel.  And as your email shows subsequent 
calls don't incur that cost and are much faster.

You are currently using OntModelSpec.OWL_MEM which is likely not the most 
performant rule set, there are variants that trade off OWL features/coverage 
for improved performance.  That isn't my area of expertise but other people on 
the list can probably suggest a relevant rule set if you can give some details 
of what kinds of inferences you are requiring.

Rob

On 14/03/2018, 18:41, "Marc Agate" <agate.m...@gmail.com> wrote:

    Hi Rob,
    Did you notice that in each case I display the full resultSet (and
    therefore consume it) in the console ?In each case, I gave two numbers:
    one that measures the time taken by execSelect (this number is relevant
    since it can change a lot depending upon the query) and the second
    number that measures (precisely) the time to consume the ResultSet
    using ResulSetFormatter.asText(ResultSet res).
    I am not benchmarking Jena and I have therefore no interest in the
    timing values per se. My issue is just that I cannot use the API
    because it's taking too long to return 86 results out of a model
    comprising approximatively 4600 statements.
    You are also telling me that " InfModel infMod =
    ModelFactory.createInfModel(reasoner, m); " doesn't actually provides a
    usable Model and that inference rules are applied when the ResultSet is
    consumed. Well it looks like it's really the case : the execSelect()
    takes 1ms in the case of an infModel, and "
    System.out.println(ResultSetFormatter.asText(rs));" takes almost 19
    seconds to complete).
    Moreover, I actually ran the same test twice on the same infModel
    object and yes, the second time it took 2ms for execSelect and 7ms to
    consume the resultSet.
    My conclusion is that one cannot use queries on InfModel created at
    realtime (OR : any InfModel must be used once to be really usable then
    after)

    Marc

    Le mercredi 14 mars 2018 à 16:41 +0000, Rob Vesse a écrit :
    > You've made a common error that people trying to benchmark Jena
    > make.  execSelect() simply prepares a result set backed by an
    > iterator that is capable of answering the query, until you are
    > consume that result set no execution actually takes place.  All query
    > execution in Jena is lazy, if you want to time the time to execute
    > the full results use a method that consumes/copies the returned
    > iterator such as ResultSetFactory.copyResults() thus forcing full
    > execution to happen
    > 
    > So what you are timing as the results processing is actually results
    > processing + query execution.  Over an inference model the act of
    > executing a query will cause inference rules to be applied which
    > depending on the ontology and rules may take a long time.
    > 
    > Rob
    > 
    > On 14/03/2018, 16:26, "agate.m...@gmail.com" <agate.m...@gmail.com>
    > wrote:
    > 
    >     Hi,
    >     
    >     I have included here (https://gist.github.com/MarcAgate/8bbe334fd
    > 852817977c909af107a9c6b) some code tha illustrates the issue.
    >     It runs the same query against three different models (Model,
    > InfModel and OntModel) of the same ontology.
    >     There's obviously a problem with InfModel.
    >     
    >     Any idea ?
    >     
    >     Thanks
    >     
    >     Marc
    >     
    >     
    > 
    > 
    > 
    >

Re: Speed issue while processing query resultSets on various ontology models

Reply via email to