Dear Apache Jena users, the experience with the http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData trials and the unanswered question https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits led me to the assumption that it would be possible to run the wikidata import for Jena on a costly 4 TB SSD but then use the resulting database on much cheaper rotating disk and see not much of a performance difference for queries then running from the SSD.
My assumption was based on the https://jena.apache.org/documentation/tdb/architecture.html and the mentioned use of https://en.wikipedia.org/wiki/B+_tree. I thought the B+ tree approach is optimized for making sure that not too many time costly seeks are necessary when fetching data during a query. My experiment at http://wiki.bitplan.com/index.php/WikiData_Import_2020-07-15#log_for_query shows a different result for the query: SELECT (COUNT(*) as ?Triples) WHERE { ?s ?p ?o} It takes 31.501 secs on a rotating disk which is only a bit slower than the SSD in raw i/o but has the seek time of a rotating disk while the SSD does not have this performance penalty and the query takes 5.516 secs for the SSD. Would other queries see the same factor 6 difference or does the speed difference depend on the query? Please suggest some queries that i might test and then I will report the results here. Cheers Wolfgang -- Wolfgang Fahl Pater-Delp-Str. 1, D-47877 Willich Schiefbahn Tel. +49 2154 811-480, Fax +49 2154 811-481 Web: http://www.bitplan.de
signature.asc
Description: OpenPGP digital signature
