Dear Apache Jena users,

the experience with the
http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData trials
and the unanswered question
https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits
led me to the assumption that it would be possible
to run the wikidata import for Jena on a costly 4 TB SSD but then use
the resulting database on much cheaper rotating disk and see not much of
a performance difference for queries then running from the SSD.

My assumption was based on the
https://jena.apache.org/documentation/tdb/architecture.html and the
mentioned use of https://en.wikipedia.org/wiki/B+_tree.
I thought the B+ tree approach is optimized for making sure that not too
many time costly seeks are necessary when fetching data during a query.

My experiment at
http://wiki.bitplan.com/index.php/WikiData_Import_2020-07-15#log_for_query
shows a different result for the query:

SELECT (COUNT(*) as ?Triples) WHERE { ?s ?p ?o}
 

It takes 31.501 secs on a rotating disk which is only a bit slower than
the SSD in raw i/o but has the seek time of a rotating disk while the
SSD does not have this performance penalty and the query takes 5.516
secs for the SSD.

Would other queries see the same factor 6 difference or does the speed
difference depend on the query? Please suggest some queries that i might
test and then I will report the results here.

Cheers

  Wolfgang


-- 
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web: http://www.bitplan.de

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to