Hello. On 2 Dec 2017 8:55 pm, "Andy Seaborne" <[email protected]> wrote:
Short story I used the following "reasonable" device > > Dell M3800 > Fedora 27 > 16GB SODIMM DDR3 Synchronous 1600 MHz > CPU cache L1/256KB,L2/1MB,L3/6MB > Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads > > to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM > disk and; > > @800% 60K/Sec > @100% 40K/Sec > @50% 20K/Sec > > The full source file contains 2.2G of triples in 10GB bz2 which > decompresses to 250GB nt, which I split into 10M triple chunks and used the > first one to test. > Which tdb loader? TDB2 For TDB1, the two loader behave very differently. I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8 hours (76K triples/s) using TDB1 tdbloader2. I'll write it up soon. Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS. Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest concurrent source the server has handled. Fans work! Check with Andy but I think it's limited by CPU, which is why my 24 core (4 > x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no > performance hit. > The limit at scale is the I/O handling and disk cache. 128G RAM gives a better disk cache and that server machine probably has better I/O. It's big enough to fit one whole index (if all RAM is available - and that depends on the swappiness setting which should be set to zero ideally). CPU is a limit for a while but you'll see the load speed slows down so it is not purely CPU as the limit. (As the indexes are 200-way trees, they don't get very deep.) tdbloader (loader1) does one index at a time so that the I/O is constrained, unlike simply adding triples to all 3 indexes together (which is what TDB2 loader does currently). loader1 degrades at large scale due to random I/O write patterns on secondary indexes. Hence an SSD makes a big difference. loader2 (which has high overhead) avoids the problems and only write indexes from sorted input so no random access to the indexes. An SSD makes less difference. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the > next few days and I will try and test against it. > > I haven't run the full import because a: i'm guessing the resulting TDB2 > will be "large" b: my servers are currently importing other "large" > TDB2's!!! > The TDB2 database for a single graph will be same size as TDB1 using tdbloader (not tdbloader2). > Long story follows... > <lots of interesting numbers>
