Re: Understanding the output of Jena TDB Loader

Daniel Hernandez Tue, 16 Feb 2021 01:21:04 -0800

Hi,


>>> tdbloader2 may not be the right choice. It is a bit niche but if you
>>> have much less RAM than total data it can be better than tdbloader and
>>> it is better if there is rotating disk, not SSD. It has been reported
>>> to be the right choice for several billion for SSD.
>> I have a SSD disk, a machine with 256 GB of ram, and 32 cores. Do
>> you recommend using tdbloader in this setting?
>
> The rate you were getting seem low even for tdbloader2 - is it all SDD
> or could /tmp be on a disk? And is the SSD local or remove (e.g. EBS)?
>
> As a general point, because the hardware matters, it is a case of try
> a few cases and see.

Sorry, I have been confused. The disk where I was loading the data was a
local rotating disk of 7200 rpm. The machine has also an SSD but is too
small to do the experiment.

> Does to have to be TDB1? "tdb2.tdbloader --loader=parallel" is the
> most aggressive loader. For TDB1, I'm not sure if "tdbloader2" or
> "tdbloader" will be faster end-to-end.

I have running some queries using TDB1 before, so I want to compare the
performance in similar conditions. Otherwise, I would have to run the
queries again for TDB2. So I have to evaluate what option is better.

> I'd be interested in what you found out. It's been a while since I had
> access to a large machine (which was on AWS ~240G RAM, local SSD). I
> used tdb2.tdbloader (i.e. TDB2).

I am sorry that my machine was not so good because it has a rotating
disk. I have another machine, with a 1T local SSD disk, but with only 64
GB. I am going to test the loading speed on that machine (when that
machine finishes the jobs it is doing). I wonder if it is better to load
the data using a fast disk, a lot of RAM, or a lot of cores.

Best,
Daniel

Re: Understanding the output of Jena TDB Loader

Reply via email to