Re: tdb2.tdbloader performance

Dick Murray Sat, 02 Dec 2017 13:35:27 -0800

Hello.

On 2 Dec 2017 8:55 pm, "Andy Seaborne" <[email protected]> wrote:



Short story I used the following "reasonable" device
>
>      Dell M3800
>      Fedora 27
>      16GB SODIMM DDR3 Synchronous 1600 MHz
>      CPU cache L1/256KB,L2/1MB,L3/6MB
>      Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads
>
> to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
> disk and;
>
> @800%    60K/Sec
> @100%    40K/Sec
> @50%    20K/Sec
>
> The full source file contains 2.2G of triples in 10GB bz2 which
> decompresses to 250GB nt, which I split into 10M triple chunks and used the
> first one to test.
>

Which tdb loader?


TDB2


For TDB1, the two loader behave very differently.

I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.

I'll write it up soon.


Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS.
Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest
concurrent source the server has handled. Fans work!



Check with Andy but I think it's limited by CPU, which is why my 24 core (4
> x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
> performance hit.
>

The limit at scale is the I/O handling and disk cache. 128G RAM gives a
better disk cache and that server machine probably has better I/O.  It's
big enough to fit one whole index (if all RAM is available - and that
depends on the swappiness setting which should be set to zero ideally).

CPU is a limit for a while but you'll see the load speed slows down so it
is not purely CPU as the limit. (As the indexes are 200-way trees, they
don't get very deep.)

tdbloader (loader1) does one index at a time so that the I/O is
constrained, unlike simply adding triples to all 3 indexes together (which
is what TDB2 loader does currently).

loader1 degrades at large scale due to random I/O write patterns on
secondary indexes.  Hence an SSD makes a big difference.

loader2 (which has high overhead) avoids the problems and only write
indexes from sorted input so no random access to the indexes.  An SSD makes
less difference.


I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
> next few days and I will try and test against it.
>
> I haven't run the full import because a: i'm guessing the resulting TDB2
> will be "large" b: my servers are currently importing other "large"
> TDB2's!!!
>

The TDB2 database for a single graph will be same size as TDB1 using
tdbloader (not tdbloader2).


> Long story follows...
>

<lots of interesting numbers>

Re: tdb2.tdbloader performance

Reply via email to