Re: Fuseki 0.2.3 performance / StatsMatcher warnings

Andy Seaborne Sat, 04 Aug 2012 09:50:15 -0700

Hi Michael.

(Michael has sent me a copy of this database he's using - a bit big tooemail even though it's only 157,809,969 triples and 44Gbytes).

I can recreate this - I'm getting a total of 5006ms for the query set of17 queries you're using.

With a fix, it's 150ms made up of approx 50ms of server-side execution,100ms of HTTP networking and results transmission for all 17 queries.


With the workaround, it's about 170ms.

Sorry - the fix will not make the next release, which is already built.Also, my current fix does the right thing for your case but I wantmake sure it's not got any concurrency problems.



Workaround:

Remove the "stats.opt" from the database directory and create a file"fixed.opt" in that directory. An empty "fixed.opt" is fine - it's notactually read; it's the presence that matters. Caution - you need toget rid of the stats.opt file as it's used in preference.


You'll need to see if the change affects other, more complex queries.

dbpedia as a very unusual dataset at the best of times (42K differentunique properties). Depressingly, fixed.opt does a reasonably job ofoptimizing. It simply looks for more tightly constrained triplepatterns and mildly avoid rdf:type.

(The other optimizer option is "none.opt" when BGPs in queries areexecuted in the order written. Good for control and experimentation.)



Explanation:

I said:
[[

But the performance is not to do with stats - this is all single quadlookup.

]]

Not quite true :-) While the statistics themselves don't matter, thesystem is reading stats.opt too often. Normally, this isn't tooimportant because the file is small, heavily cached and fast to parse(it still shouldn't do it). But the dbpedia stats.opt is big at 2.1Mbytes and 42K entries.


        Andy

Re: Fuseki 0.2.3 performance / StatsMatcher warnings

Reply via email to