Hi,
>> The disk where I was loading the data was a local rotating disk of >> 7200 rpm. The machine has also an SSD but is too small to do the >> experiment. > > tdbloader2 may be the right choice for that setup - it was written > with disks in mind. It uses Unix sort(1). What it needs is to tune the > parameters to the runs of "sort" Thanks, this information is very useful. > Wolfgang Fahl has loaded large (several billion triples) > > https://issues.apache.org/jira/browse/JENA-1909 > > and his notes are at: > > http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData I also have loaded Wikidata in a very small virtual machine with a single core, and a rotating non local disk. I remember it lasted more than a week. I do not saved the log, because the machine was running other jobs at the same time. Next time I loaded a big dataset I will share the machine specification and loading log. >> I wonder if it is better to load the data using a fast disk, a lot of >> RAM, or a lot of cores. > > A few years ago, I ran load tests of two machines, one 32G+SATA SSD, > one 16G+ 1TB M2 SSD. The 16G but faster SSD was quicker overall. That is interesting. I am considering to have a machine with an NVMe SSD disk for the next loading. > Database directories can be copied across machines after they have > been built. The tdbloader2 generates some files with the tmp extension. The file data-triples.tmp can be very big. The name suggest that it is a temporal file. Can I delete that file after the loading ends? Best, Daniel