On 13/07/12 11:33, Michael Brunnbauer wrote:

Hello Andy,

On Thu, Jul 12, 2012 at 02:46:49PM +0100, Andy Seaborne wrote:
The things to try that occur to be are to simplify the situation:

1/ Run with only one dataset on the server

Does not help.

2/ Change the workload: do multiple calls of each query , so do
q1,q1,q1,q1,q2,q2,q2,q2, etc

Immediately repeating a query does not make it fast.

I now have a set of queries which is slow against the dbpedia dataset
(156966301 quads) and very fast against the ontology dataset (12177 quads).

There are results for some of the queries in both datasets. Queries that
return no results do not seem to be faster.

There is no significant disk activity while I run the queries (the machine has
48GB RAM and everything relevant is in memory after the first time I run the
queries). I can see that Fuseki is under high cpu load (99%) while I run the
queries. Fuseki has a virtual size high above 4GB - so it seems to use memory
mapped IO for the TDB files.

Yes - files are memory mapped.

I'm sure you'll understand that it's hard to debug this without access to a replicated setup. I can try to set up something if you make the data available but I don't have access to a 48G machine. It is possible that the large memory is an issue - I have read elsewhere of strange things happening with mmap I/O. I don't see how the change in TDB versions is going to make a difference because IIRC nothing has changed except the read transaction wrapper which I've tested.

sparql-wrapper can be eliminated by running

wget -O /dev/null 'http://ts.foaf-search.net:3030/foaf/query?query=....'

for each query. That does not maintain the connection; but I don't think sparql-wrappers use of Python http code does either.

The snapshot build of Fuseki 0.2.4-SNAPSHOT has the improvements I discovered in passing.

The queries will all use the POSG index which may not have paged in but repeated use should have eliminated that possibility.

Running in direct mode (-Dtdb:fileMode=direct and an info levelk logging event happens) would see if it's some weird mmap issue to do with the different Jetty's.

I have checked that the DB isn't being reopened each time - it's open once on start up and not within a run.

        Andy


Regards,

Michael Brunnbauer



Reply via email to