On 12/06/12 08:29, Dave Reynolds wrote:
> On 12/06/12 07:46, Jauhiainen Matti wrote:
>
>> I'm doing the queries over single inferred in memory model, which has
>> around half a million triplets. Written as RDF/XML it takes around 5
>> MB on disk. I run the queries on desktop with 4 GB of RAM and Core 2
>> Quad @ 2.66GHz. Am I missing something with the computational
>> complexity of the first two queries? What makes the second and third
>> query so different?
>
> Without having any details of your data it is extremely likely that
> all the slow down you are seeing is in the inference, not the query
> processing.
>
> To test this try materializing an inference closure and then run your
> queries on that closure [i.e. create a plain memory model and add()
> the inferred model or, preferably, just those inferences you need].
>
> Dave
On 12/06/12 07:46, Jauhiainen Matti wrote:
> Hi,
>
> I have performance issues with certain types of SPARQL queries over Jena
> model. Things get slow when I try to query patterns with multiple relations
> between resources, for example:
>
> DESCRIBE ?var1 ?var2 ?var3 WHERE {
> ?var1 NS:type 'X' .
> ?var2 NS:type 'Y' .
> ?var3 NS:type 'Z' .
A three partial cross product.
> ?var1 NS:dependency ?var2 .
> ?var2 NS:dependency ?var3
Those last two can very expensive as well.
You may find reordering the pattern helps.
Try a SELECT query and see what happens - DESCRIBE is doing an implicit SELECT
DISTINCT - try without DISTINCT and see how many cases the query has to
consider.
> }
>
> or even just:
>
> DESCRIBE ?var1 ?var2 ?var3 WHERE {
> ?var1 NS:type 'X' .
> ?var2 NS:type 'Y' .
> ?var3 NS:type 'Z' .
Suggests it's the unconnected types causing lost of work, possible stressing
the JVM.
> }
>
> These take longer to complete than I care to wait (over an hour at
least) while similar query will complete in seconds, e.g.
>
> DESCRIBE ?var1 ?var2 WHERE {
> ?var1 NS:type 'X' .
> ?var2 NS:type 'Y' .
> ?var1 NS:dependency ?var2 .
How many ?var NS:type 'Z'?
There are around 2000 of X, 8000 of Y and 200 of Z. Anyhow seems like the
inference was at least part of the problem. When I do the inference beforehand
and add the model to an empty one like Dave suggested, the first example with
dependency relations between var1 var2 and var3 completes in less than a
second, which is perfectly acceptable and solves the problem. I didn't realize
inference is done on demand with InfModel.
The second example with three unconnected variables still takes a long time
though. Now it was just an example, I don't really need that query, I'm just
interested in what is causing the degraded performance. If I do the queries in
separate query executions it works a lot faster.
Matti Jauhiainen
Do you need to ask the NS:type at all?
> }
>
> I'm doing the queries over single inferred in memory model, which has around
> half a million triplets. Written as RDF/XML it takes around 5 MB on disk. I
> run the queries on desktop with 4 GB of RAM and Core 2 Quad @ 2.66GHz. Am I
> missing something with the computational complexity of the first two
> queries? What makes the second and third query so different?
Inference does not look to be the root cause but it's going to add a cost.
>
> Regards,
>
> Matti Jauhiainen
>
>