Hi Michael.
(Michael has sent me a copy of this database he's using - a bit big too
email even though it's only 157,809,969 triples and 44Gbytes).
I can recreate this - I'm getting a total of 5006ms for the query set of
17 queries you're using.
With a fix, it's 150ms made up of approx 50ms of server-side execution,
100ms of HTTP networking and results transmission for all 17 queries.
With the workaround, it's about 170ms.
Sorry - the fix will not make the next release, which is already built.
Also, my current fix does the right thing for your case but I want
make sure it's not got any concurrency problems.
Workaround:
Remove the "stats.opt" from the database directory and create a file
"fixed.opt" in that directory. An empty "fixed.opt" is fine - it's not
actually read; it's the presence that matters. Caution - you need to
get rid of the stats.opt file as it's used in preference.
You'll need to see if the change affects other, more complex queries.
dbpedia as a very unusual dataset at the best of times (42K different
unique properties). Depressingly, fixed.opt does a reasonably job of
optimizing. It simply looks for more tightly constrained triple
patterns and mildly avoid rdf:type.
(The other optimizer option is "none.opt" when BGPs in queries are
executed in the order written. Good for control and experimentation.)
Explanation:
I said:
[[
But the performance is not to do with stats - this is all single quad
lookup.
]]
Not quite true :-) While the statistics themselves don't matter, the
system is reading stats.opt too often. Normally, this isn't too
important because the file is small, heavily cached and fast to parse
(it still shouldn't do it). But the dbpedia stats.opt is big at 2.1
Mbytes and 42K entries.
Andy