Hello everyone, As an exercise in time-wasting I undertook to complete some performance tests on the Tracker RDF database (Tracker store).
The standard tests I decided to use were those of the Berlin SPARQL benchmarks. The specification for these tests can be found at: http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html Results ------- The headline result is that tracker is roughly 9x faster than virtuoso at performing the queries in these tests. Tracker: Query 12 and 9 were not run due to missing SPARQL features in tracker. Scale factor: 1000 Number of warmup runs: 128 Number of clients: 4 Seed: 808080 Number of query mix runs (without warmups): 1024 times min/max Querymix runtime: 3.8705s / 7.4565s Total runtime (sum): 5573.465 seconds Total actual runtime: 1397.849 seconds QMpH: 2637.19 query mixes per hour CQET: 5.44284 seconds average runtime of query mix CQET (geom.): 5.42221 seconds geometric mean runtime of query mix Metrics for Query: 1 Count: 1024 times executed in whole run AQET: 0.166173 seconds (arithmetic mean) AQET(geom.): 0.144697 seconds (geometric mean) QPS: 23.99 Queries per second minQET/maxQET: 0.06346957s / 1.97053821s Average result count: 0.57 min/max result count: 0 / 5 Number of timeouts: 0 Metrics for Query: 2 Count: 6144 times executed in whole run AQET: 0.207744 seconds (arithmetic mean) AQET(geom.): 0.134743 seconds (geometric mean) QPS: 19.19 Queries per second minQET/maxQET: 0.04699750s / 2.55192438s Average result count: 20.47 min/max result count: 8 / 41 Number of timeouts: 0 Metrics for Query: 3 Count: 1024 times executed in whole run AQET: 0.241528 seconds (arithmetic mean) AQET(geom.): 0.162570 seconds (geometric mean) QPS: 16.51 Queries per second minQET/maxQET: 0.06215394s / 2.08519456s Average result count: 0.34 min/max result count: 0 / 4 Number of timeouts: 0 Metrics for Query: 4 Count: 1024 times executed in whole run AQET: 0.252148 seconds (arithmetic mean) AQET(geom.): 0.192841 seconds (geometric mean) QPS: 15.81 Queries per second minQET/maxQET: 0.10444709s / 2.21926131s Average result count: 0.00 min/max result count: 0 / 1 Number of timeouts: 0 Metrics for Query: 5 Count: 1024 times executed in whole run AQET: 1.238432 seconds (arithmetic mean) AQET(geom.): 1.124482 seconds (geometric mean) QPS: 3.22 Queries per second minQET/maxQET: 0.36639510s / 4.12935953s Average result count: 3.25 min/max result count: 0 / 5 Number of timeouts: 0 Metrics for Query: 6 Count: 1024 times executed in whole run AQET: 0.230515 seconds (arithmetic mean) AQET(geom.): 0.167703 seconds (geometric mean) QPS: 17.30 Queries per second minQET/maxQET: 0.07234605s / 2.18394154s Average result count: 1.00 min/max result count: 1 / 1 Number of timeouts: 0 Metrics for Query: 7 Count: 4096 times executed in whole run AQET: 0.286743 seconds (arithmetic mean) AQET(geom.): 0.176770 seconds (geometric mean) QPS: 13.91 Queries per second minQET/maxQET: 0.05408344s / 2.96657009s Average result count: 10.15 min/max result count: 1 / 28 Number of timeouts: 0 Metrics for Query: 8 Count: 2048 times executed in whole run AQET: 0.224936 seconds (arithmetic mean) AQET(geom.): 0.119174 seconds (geometric mean) QPS: 17.73 Queries per second minQET/maxQET: 0.01116535s / 2.23874045s Average result count: 0.00 min/max result count: 0 / 0 Number of timeouts: 0 Metrics for Query: 9 Count: 0 times executed in whole run AQET: 0.000000 seconds (arithmetic mean) AQET(geom.): NaN seconds (geometric mean) QPS: Infinity Queries per second minQET/maxQET: 17976931348623157s Average result (Bytes): 0.00 min/max result (Bytes): 2147483647 / -2147483648 Number of timeouts: 0 Metrics for Query: 10 Count: 2048 times executed in whole run AQET: 0.173706 seconds (arithmetic mean) AQET(geom.): 0.124314 seconds (geometric mean) QPS: 22.95 Queries per second minQET/maxQET: 0.02448088s / 1.99127881s Average result count: 1.18 min/max result count: 0 / 6 Number of timeouts: 0 Metrics for Query: 11 Count: 1024 times executed in whole run AQET: 0.123324 seconds (arithmetic mean) AQET(geom.): 0.097755 seconds (geometric mean) QPS: 32.33 Queries per second minQET/maxQET: 0.01224387s / 1.33136896s Average result count: 13.00 min/max result count: 13 / 13 Number of timeouts: 0 Metrics for Query: 12 Count: 0 times executed in whole run AQET: 0.000000 seconds (arithmetic mean) AQET(geom.): NaN seconds (geometric mean) QPS: Infinity Queries per second minQET/maxQET: 17976931348623157s Average result (Bytes): 0.00 min/max result (Bytes): 2147483647 / -2147483648 Number of timeouts: 0 78. Virtuoso: Benchmark run completed in 12366.220356159s Scale factor: 1000 Number of warmup runs: 128 Number of clients: 4 Seed: 808080 Number of query mix runs (without warmups): 1024 times min/max Querymix runtime: 25.0356s / 60.9157s Total runtime (sum): 49303.512 seconds Total actual runtime: 12366.220 seconds QMpH: 298.10 query mixes per hour CQET: 48.14796 seconds average runtime of query mix CQET (geom.): 47.96361 seconds geometric mean runtime of query mix Metrics for Query: 1 Count: 1024 times executed in whole run AQET: 0.050511 seconds (arithmetic mean) AQET(geom.): 0.029894 seconds (geometric mean) QPS: 78.93 Queries per second minQET/maxQET: 0.00594442s / 0.35991040s Average result count: 0.57 min/max result count: 0 / 5 Number of timeouts: 0 Metrics for Query: 2 Count: 6144 times executed in whole run AQET: 5.591946 seconds (arithmetic mean) AQET(geom.): 5.438305 seconds (geometric mean) QPS: 0.71 Queries per second minQET/maxQET: 2.04874958s / 10.81485338s Average result count: 20.47 min/max result count: 8 / 41 Number of timeouts: 0 Metrics for Query: 3 Count: 1024 times executed in whole run AQET: 0.052267 seconds (arithmetic mean) AQET(geom.): 0.031281 seconds (geometric mean) QPS: 76.28 Queries per second minQET/maxQET: 0.00697979s / 1.19080248s Average result count: 0.34 min/max result count: 0 / 4 Number of timeouts: 0 Metrics for Query: 4 Count: 1024 times executed in whole run AQET: 0.106704 seconds (arithmetic mean) AQET(geom.): 0.061361 seconds (geometric mean) QPS: 37.36 Queries per second minQET/maxQET: 0.00904308s / 0.99024453s Average result count: 0.00 min/max result count: 0 / 1 Number of timeouts: 0 Metrics for Query: 5 Count: 1024 times executed in whole run AQET: 3.650117 seconds (arithmetic mean) AQET(geom.): 3.250134 seconds (geometric mean) QPS: 1.09 Queries per second minQET/maxQET: 0.63302203s / 11.57777184s Average result count: 3.25 min/max result count: 0 / 5 Number of timeouts: 0 Metrics for Query: 6 Count: 1024 times executed in whole run AQET: 0.175229 seconds (arithmetic mean) AQET(geom.): 0.159130 seconds (geometric mean) QPS: 22.75 Queries per second minQET/maxQET: 0.03786253s / 0.49990789s Average result count: 1.05 min/max result count: 1 / 8 Number of timeouts: 0 Metrics for Query: 7 Count: 4096 times executed in whole run AQET: 1.301037 seconds (arithmetic mean) AQET(geom.): 1.074876 seconds (geometric mean) QPS: 3.06 Queries per second minQET/maxQET: 0.03227636s / 5.98356504s Average result count: 10.15 min/max result count: 1 / 28 Number of timeouts: 0 Metrics for Query: 8 Count: 2048 times executed in whole run AQET: 1.472874 seconds (arithmetic mean) AQET(geom.): 1.167677 seconds (geometric mean) QPS: 2.71 Queries per second minQET/maxQET: 0.01117960s / 5.30360235s Average result count: 4.75 min/max result count: 0 / 15 Number of timeouts: 0 Metrics for Query: 9 Count: 4096 times executed in whole run AQET: 0.064159 seconds (arithmetic mean) AQET(geom.): 0.050390 seconds (geometric mean) QPS: 62.14 Queries per second minQET/maxQET: 0.01071517s / 0.73873880s Average result (Bytes): 8299.27 min/max result (Bytes): 2578 / 13300 Number of timeouts: 0 Metrics for Query: 10 Count: 2048 times executed in whole run AQET: 0.800822 seconds (arithmetic mean) AQET(geom.): 0.513786 seconds (geometric mean) QPS: 4.98 Queries per second minQET/maxQET: 0.02532729s / 2.70724267s Average result count: 1.18 min/max result count: 0 / 6 Number of timeouts: 0 Metrics for Query: 11 Count: 1024 times executed in whole run AQET: 0.082581 seconds (arithmetic mean) AQET(geom.): 0.073997 seconds (geometric mean) QPS: 48.28 Queries per second minQET/maxQET: 0.02482279s / 0.23430865s Average result count: 10.00 min/max result count: 10 / 10 Number of timeouts: 0 Metrics for Query: 12 Count: 1024 times executed in whole run AQET: 0.470700 seconds (arithmetic mean) AQET(geom.): 0.454558 seconds (geometric mean) QPS: 8.47 Queries per second minQET/maxQET: 0.17751580s / 1.10537926s Average result (Bytes): 2608.32 min/max result (Bytes): 2565 / 2650 Number of timeouts: 0 Method ------ Benchmark Data: A number of changes had to be made to the benchmark data before they could be run on-top of tracker. Unfortunately the benchmarks did not provide a machine-readable ontology, so I had to create one for myself. The form of the data for the tests was difficult for tracker to handle. If you look at the specifications you will see that there is a class there called ProductType. ProductType, they say, forms an 'irregular subsumption hierarchy' although I think this is a posh way of saying that product types themselves form a class hierarchy. Put another way the data for the tests contains resources of the form: bsbm-inst:ProductType011432 rdf:type bsbm:ProductType ; rdfs:label "Digital Camera" ; rdfs:subClassOf bsbm-inst:ProductType011000 dc:publisher bsbm-inst:StandardizationInstitution01 ; dc:date "2008-02-13"^^xsd:date . This is fundamentally incompatible with the lack of runtime ontology definition within tracker. To get around this I modified the dataset generator to output the product types as a separate file to be placed in the ontology folder. The other issue with this is that although the rdfs specifications declare that the tuple 'bsbm-inst:ProductType011432 rdfs:subClassOf bsbm-inst:ProductType011000' implicitly declares that the subject is an rdfs:Class this is not supported in Tracker. The tuple 'bsbm-inst:ProductType011432 a rdfs:Class' had to be added for the ontology to function. Linked external data is also used heavily in this data set. Tracker was not capable of this. The external URI's were converted in to strings and the queries changed accordingly. foaf was another major issue. foaf was not one of the ontologies included within tracker, and the data contained foaf elements. Ontologies were created that somewhat represented foaf and geo. Enough to get the bsbm data in to tracker. Other smaller issues with the format of the data and tracker were: 1) dc:publisher is not well defined. The dataset includes URI's for the publisher. This should possibly be allowed as the dc specifications do not say anything about the data type of the range. 2) xsd:date is not a supported data type in tracker. These had to be converted to dateTime. This included values of dc:date. 3) Instances of rdfs:Class are not implicitly rdfs:Resources. Looking at the example turtle above there is no 'rdf:type rdfs:Resource' specification I believe that this is implicit in the type specification given, but I may be wrong. I could find nothing in the rdfs specifications to indicate that. I simply added rdfs:Resource tuples to the data-set. SPARQL end point: The tests required a HTTP SPARQL end point. This was written in a very simple manner for tracker using django. The very basic django http server was used for serving the requests. Apache and mod_python were attempted, but this lead to extremely irregular results and some failures. Conclusion ---------- Tracker, despite being a much more lightweight rdf database compared to virtuoso, has much better query performance in the Berlin SPARQL benchmark. This is somewhat to be expected as tracker uses decomposed tables whereas virtuoso, I believe, does not. The downside to this is that virtuoso is much more flexible and much better suited for general purpose rdf storage. The data type limits, non-schema-free operation, and limited ontology support of tracker made it very difficult to use for the data set provided. This is obviously not an issue for ontologies and data that has been created specifically for tracker. The results for virtuoso are far more variable than the ones for tracker. This may be intrinsic in virtuoso or could be an indication that there is some significant query overhead that is preventing tracker from performing well on the very simple queries. I have not attempted to look at how much overhead the http sparql end point is adding. This is one possible way to improve these performance tests. There are also some queries that report slightly different numbers of results between virtuoso and tracker. There is unlikely to be an error in tracker here. The translation of the ontology and data set for use with tracker has been extensive, and has likely introduced some errors. Thanks Mark _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
