Re: Understanding the output of Jena TDB Loader

Daniel Hernandez Tue, 23 Feb 2021 09:56:06 -0800

Hi,


>> The disk where I was loading the data was a local rotating disk of
>> 7200 rpm. The machine has also an SSD but is too small to do the
>> experiment.
>
> tdbloader2 may be the right choice for that setup - it was written
> with disks in mind. It uses Unix sort(1). What it needs is to tune the
> parameters to the runs of "sort"

Thanks, this information is very useful.

> Wolfgang Fahl has loaded large (several billion triples)
>
> https://issues.apache.org/jira/browse/JENA-1909
>
> and his notes are at:
>
>  http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData

I also have loaded Wikidata in a very small virtual machine with a
single core, and a rotating non local disk. I remember it lasted more
than a week. I do not saved the log, because the machine was running
other jobs at the same time. Next time I loaded a big dataset I will
share the machine specification and loading log.

>> I wonder if it is better to load the data using a fast disk, a lot of
>> RAM, or a lot of cores.
>
> A few years ago, I ran load tests of two machines, one 32G+SATA SSD,
> one 16G+ 1TB M2 SSD.  The 16G but faster SSD was quicker overall.

That is interesting.  I am considering to have a machine with an NVMe
SSD disk for the next loading.

> Database directories can be copied across machines after they have
> been built.

The tdbloader2 generates some files with the tmp extension. The file
data-triples.tmp can be very big. The name suggest that it is a temporal
file. Can I delete that file after the loading ends?

Best,
Daniel

Re: Understanding the output of Jena TDB Loader

Reply via email to