Hi Michael,
A few facts please:
How many birds are there?
What's
SELECT (count(*) AS ?C1)
{ ?d1 <http://www.wikidata.org/entity/P171s> ?v }
SELECT (count(*) AS ?C2)
{ ?s <http://www.wikidata.org/entity/P171v>
<http://www.wikidata.org/entity/Q5113> }
It's probably a bad execution plan. It's supposed to execute the path
backwards which, caveat the reverse link fan out rates, should be OK.
On 02/06/15 14:58, Michael Brunnbauer wrote:
hi all,
I have performance problems with queries using property paths on a Fuseki
2.0.0 TDB with half a billion triples from Wikidata. Ramdom disk access does
not seem to be the cause. I use a SSD and see low IO tps values during queries
but high CPU usage. I tried with and without the automatically generated
stats.opt.
Counting all birds takes ca. 8s if not called for the first time (no disk
access, everything in memory):
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v>
)+ <http://www.wikidata.org/entity/Q5113>
}
(That's not legal SPARQL :-) The joys of compatibility mode.
Counting all beetles does not seem to finish:
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v>
)+ <http://www.wikidata.org/entity/Q22671>
}
I tried with and without stats.opt and also with inverse paths (^property)
without success.
I guess this is not the "Counting Beyond a Yottabyte" problem?
http://www.w3.org/blog/SW/2012/04/19/no-more-counting-beyond-a-yottabyte-or-why-the-w3c-process-works/
https://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0003.html
No. (The data isn't a cliche unless the data is really bizarre)
That is a theoretical piece of work on a different design.
(The fact it uses hyperbole and ridicule to make a technical point is
merely annoying.)
If I do a count(distinct ?d1) in the Bird query, I get the same number so I
guess that the + makes the query "non-counting".
Yes.
Any idea if this slow performance is to be expected and why?
Regards,
Michael Brunnbauer