VS: Jena query performance

Jauhiainen Matti Tue, 12 Jun 2012 03:59:18 -0700

On 12/06/12 08:29, Dave Reynolds wrote:
> On 12/06/12 07:46, Jauhiainen Matti wrote:
>
>> I'm doing the queries over single inferred in memory model, which has 
>> around half a million triplets. Written as RDF/XML it takes around 5 
>> MB on disk. I run the queries on desktop with 4 GB of RAM and Core 2 
>> Quad @ 2.66GHz. Am I missing something with the computational 
>> complexity of the first two queries? What makes the second and third 
>> query so different?
>
> Without having any details of your data it is extremely likely that 
> all the slow down you are seeing is in the inference, not the query 
> processing.
>
> To test this try materializing an inference closure and then run your 
> queries on that closure [i.e. create a plain memory model and add() 
> the inferred model or, preferably, just those inferences you need].
>
> Dave

On 12/06/12 07:46, Jauhiainen Matti wrote:
 > Hi,
 >
 > I have performance issues with certain types of SPARQL queries over Jena 
 > model. Things get slow when I try to query patterns with multiple relations 
 > between resources, for example:
 >
 > DESCRIBE ?var1 ?var2 ?var3 WHERE {
 >          ?var1 NS:type 'X' .
 >          ?var2 NS:type 'Y' .
 >          ?var3 NS:type 'Z' .

A three partial cross product.

 >          ?var1 NS:dependency ?var2 .
 >          ?var2 NS:dependency ?var3

Those last two can very expensive as well.

You may find reordering the pattern helps.

Try a SELECT query and see what happens - DESCRIBE is doing an implicit SELECT 
DISTINCT - try without DISTINCT and see how many cases the query has to 
consider.

 > }
 >
 > or even just:
 >
 > DESCRIBE ?var1 ?var2 ?var3 WHERE {
 >          ?var1 NS:type 'X' .
 >          ?var2 NS:type 'Y' .
 >          ?var3 NS:type 'Z' .

Suggests it's the unconnected types causing lost of work, possible stressing 
the JVM.

 > }
 >
 > These take longer to complete than I care to wait (over an hour at
least) while similar query will complete in seconds, e.g.
 >
 > DESCRIBE ?var1 ?var2 WHERE {
 >          ?var1 NS:type 'X' .
 >          ?var2 NS:type 'Y' .
 >          ?var1 NS:dependency ?var2 .

How many ?var NS:type 'Z'?

There are around 2000 of X, 8000 of Y and 200 of Z. Anyhow seems like the 
inference was at least part of the problem. When I do the inference beforehand 
and add the model to an empty one like Dave suggested, the first example with 
dependency relations between var1 var2 and var3 completes in less than a 
second, which is perfectly acceptable and solves the problem. I didn't realize 
inference is done on demand with InfModel.

The second example with three unconnected variables still takes a long time 
though. Now it was just an example, I don't really need that query, I'm just 
interested in what is causing the degraded performance. If I do the queries in 
separate query executions it works a lot faster.

Matti Jauhiainen

Do you need to ask the NS:type at all?

 > }
 >
 > I'm doing the queries over single inferred in memory model, which has around 
 > half a million triplets. Written as RDF/XML it takes around 5 MB on disk. I 
 > run the queries on desktop with 4 GB of RAM and Core 2 Quad @ 2.66GHz.  Am I 
 > missing something with the computational complexity of the first two 
 > queries? What makes the second and third query so different?

Inference does not look to be the root cause but it's going to add a cost.

 >
 > Regards,
 >
 > Matti Jauhiainen
 >
 >

VS: Jena query performance

Reply via email to