Hi,
>>> tdbloader2 may not be the right choice. It is a bit niche but if you >>> have much less RAM than total data it can be better than tdbloader and >>> it is better if there is rotating disk, not SSD. It has been reported >>> to be the right choice for several billion for SSD. >> I have a SSD disk, a machine with 256 GB of ram, and 32 cores. Do >> you recommend using tdbloader in this setting? > > The rate you were getting seem low even for tdbloader2 - is it all SDD > or could /tmp be on a disk? And is the SSD local or remove (e.g. EBS)? > > As a general point, because the hardware matters, it is a case of try > a few cases and see. Sorry, I have been confused. The disk where I was loading the data was a local rotating disk of 7200 rpm. The machine has also an SSD but is too small to do the experiment. > Does to have to be TDB1? "tdb2.tdbloader --loader=parallel" is the > most aggressive loader. For TDB1, I'm not sure if "tdbloader2" or > "tdbloader" will be faster end-to-end. I have running some queries using TDB1 before, so I want to compare the performance in similar conditions. Otherwise, I would have to run the queries again for TDB2. So I have to evaluate what option is better. > I'd be interested in what you found out. It's been a while since I had > access to a large machine (which was on AWS ~240G RAM, local SSD). I > used tdb2.tdbloader (i.e. TDB2). I am sorry that my machine was not so good because it has a rotating disk. I have another machine, with a 1T local SSD disk, but with only 64 GB. I am going to test the loading speed on that machine (when that machine finishes the jobs it is doing). I wonder if it is better to load the data using a fast disk, a lot of RAM, or a lot of cores. Best, Daniel