Re: tdb2.tdbloader performance
> Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS. > Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest > concurrent source the server has handled. Fans work! cool, this is another interesting statistics. Looks like there is quite some room for speeds up on a single machine (much simpler to deal with than distributing the work on several nodes), if TDB2 can be parallelized more...
Re: tdb2.tdbloader performance
@Andy > Which tdb loader? I'd guess tdb2.tdbloader since he was replying to my previous email > I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8 hours (76K triples/s) using TDB1 tdbloader2. > > I'll write it up soon. Could you please also share the model names of the hardware components? So we can check the various frequencies, bandwidths, latencies? > The limit at scale is the I/O handling and disk cache. 128G RAM gives a > better disk cache and that server machine probably has better I/O. It's > big enough to fit one whole index (if all RAM is available - and that > depends on the swappiness setting which should be set to zero ideally). Do you have any idea then why executing everything from ramdisk doesn't seem to bring any significant improvements over reading/writing from a SATA3 disk (at least in my tests)?
Re: tdb2.tdbloader performance
On 02/12/17 21:34, Dick Murray wrote: Which tdb loader? TDB2 tdb2.tdbloader? It does fine, until RAM (file system cache) gets stressed ... and for 2.2B triples, it gets stressed. (TDB2 has a fast node table). Andy
Re: tdb2.tdbloader performance
On 02/12/17 21:59, ajs6f wrote: Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads. This was worth tuning for me. sort generally picks good parameters for a system, but I was able to get noticeably better performance by adjusting (up) the parallelism manually. But of course, that's a limited amount of improvement. (It's also worth making sure your locale is set appropriately. Avoid using Unicode collation and it will speed things up impressively.) Shouldn't be necessary - tdbloader2index sets export LC_ALL="C" (see sort(1)) [[ *** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values. ]] If that didn't work, it needs a fix. Andy ajs6f On Dec 2, 2017, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote: On 01/12/17 22:28, Laura Morales wrote: Thank you very much, this is great feedback! Your setup was very similar to mine, except: - I have 8GB RAM single bank, you have 16GB probably on two banks - my CPU is "half" of yours, 2 cores 4 threads despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader As already said - tdb2.tdbloader in its current form is not suitable for loading billion triple datasets (unless there is a lot of RAM ... I'd guess upward of 256G for truthy and a tuned server (swappy=0 for example), not that I've tried). runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand... With your xeon, you said you were able to get to 120K? Right? "concurrent 120K" I understood that to mean more than one load running at once. Dick's system has multiple TDB databases and a large disk cache. (I got 76K, single load, on somewhat less hardware so that suggests 120K may be affected by I/O contention.) What xeon, mobo, and RAM did you use? If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :) Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads. Andy keep us posted! Sent: Friday, December 01, 2017 at 9:11 PM From: "Dick Murray" <dandh...@gmail.com> To: users@jena.apache.org Subject: Re: tdb2.tdbloader performance Hi. Sorry for the delay :-) Short story I used the following "reasonable" device Dell M3800 Fedora 27 16GB SODIMM DDR3 Synchronous 1600 MHz CPU cache L1/256KB,L2/1MB,L3/6MB Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM disk and; @800% 60K/Sec @100% 40K/Sec @50% 20K/Sec The full source file contains 2.2G of triples in 10GB bz2 which decompresses to 250GB nt, which I split into 10M triple chunks and used the first one to test. Check with Andy but I think it's limited by CPU, which is why my 24 core (4 x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no performance hit. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the next few days and I will try and test against it. I haven't run the full import because a: i'm guessing the resulting TDB2 will be "large" b: my servers are currently importing other "large" TDB2's!!! Long story follows... decompress the file; pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2 Parallel BZIP2 v1.1.12 [Dec 21, 2014] By: Jeff Gilchrist [http://compression.ca] Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]] Uses libbzip2 by Julian Seward # CPUs: 4 Maximum Memory: 1024 MB Ignore Trailing Garbage: off --- File #: 1 of 1 Input Name: latest-truthy.nt.bz2 Output Name: latest-truthy.nt BWT Block Size: 900k Input Size: 9965955258 bytes Decompressing data... Output Size: 277563574685 bytes --- Wall Clock: 5871.550948 seconds count the lines; wc -l latest-truthy.nt 2199382887 latest-truthy.nt Just short of 2200M... split the file into 10M chunks; split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt. creating file 'latest-truthy.nt.000' creating file 'latest-truthy.nt.001' creating file 'latest-truthy.nt.002' creating file
Re: tdb2.tdbloader performance
> Threads will not help a single load except for tdbloader2 (which is for TDB1) > if tuned - see the command help and notes. It uses sort(1) which can utilize > multiple threads. This was worth tuning for me. sort generally picks good parameters for a system, but I was able to get noticeably better performance by adjusting (up) the parallelism manually. But of course, that's a limited amount of improvement. (It's also worth making sure your locale is set appropriately. Avoid using Unicode collation and it will speed things up impressively.) ajs6f > On Dec 2, 2017, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote: > > > > On 01/12/17 22:28, Laura Morales wrote: >> Thank you very much, this is great feedback! >> Your setup was very similar to mine, except: >> - I have 8GB RAM single bank, you have 16GB probably on two banks >> - my CPU is "half" of yours, 2 cores 4 threads >> despite this, the results are very similar; maybe yours are slightly >> better. I don't understand why this "60K" seems so hard to beat. What's so >> special about it?? It's so difficult to understand what to do to improve the >> conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? >> Or a CPU with more cache? Or more memory channels? I still can't find an >> answer... Why would more cores help if tdb2.tdbloader > > As already said - tdb2.tdbloader in its current form is not suitable for > loading billion triple datasets (unless there is a lot of RAM ... I'd guess > upward of 256G for truthy and a tuned server (swappy=0 for example), not that > I've tried). > >> runs in a single thread? Maybe the reason is that with more cores, your xeon >> can handle more RAM concurrently? I don't understand... >> With your xeon, you said you were able to get to 120K? Right? > > "concurrent 120K" > > I understood that to mean more than one load running at once. Dick's system > has multiple TDB databases and a large disk cache. > > (I got 76K, single load, on somewhat less hardware so that suggests 120K may > be affected by I/O contention.) > >> What xeon, mobo, and RAM did you use? >> If anybody has any xeon or opteron, it would be nice if they could offer >> more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't >> wait to read your feedback with the Threadripper :) > > Threads will not help a single load except for tdbloader2 (which is for TDB1) > if tuned - see the command help and notes. It uses sort(1) which can utilize > multiple threads. > >Andy > >> keep us posted! >> Sent: Friday, December 01, 2017 at 9:11 PM >> From: "Dick Murray" <dandh...@gmail.com> >> To: users@jena.apache.org >> Subject: Re: tdb2.tdbloader performance >> Hi. >> Sorry for the delay :-) >> Short story I used the following "reasonable" device >> Dell M3800 >> Fedora 27 >> 16GB SODIMM DDR3 Synchronous 1600 MHz >> CPU cache L1/256KB,L2/1MB,L3/6MB >> Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads >> to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM >> disk and; >> @800% 60K/Sec >> @100% 40K/Sec >> @50% 20K/Sec >> The full source file contains 2.2G of triples in 10GB bz2 which >> decompresses to 250GB nt, which I split into 10M triple chunks and used the >> first one to test. >> Check with Andy but I think it's limited by CPU, which is why my 24 core (4 >> x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no >> performance hit. >> I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the >> next few days and I will try and test against it. >> I haven't run the full import because a: i'm guessing the resulting TDB2 >> will be "large" b: my servers are currently importing other "large" >> TDB2's!!! >> Long story follows... >> decompress the file; >> pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2 >> Parallel BZIP2 v1.1.12 [Dec 21, 2014] >> By: Jeff Gilchrist [http://compression.ca] >> Major contributions: Yavor Nikolov >> [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]] >> Uses libbzip2 by Julian Seward >> # CPUs: 4 >> Maximum Memory: 1024 MB >> Ignore Trailing Garbage: off >> --- >> File #: 1 of 1 >> Input Name: latest-truthy.nt.bz2 >> Output Name: latest-truthy.nt >> BWT Block Size: 900k >> Input Size: 9965955258 bytes >> Decompressing data... >> Output Size: 277563574685 bytes >> --
Re: tdb2.tdbloader performance
Hello. On 2 Dec 2017 8:55 pm, "Andy Seaborne"wrote: Short story I used the following "reasonable" device > > Dell M3800 > Fedora 27 > 16GB SODIMM DDR3 Synchronous 1600 MHz > CPU cache L1/256KB,L2/1MB,L3/6MB > Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads > > to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM > disk and; > > @800%60K/Sec > @100%40K/Sec > @50%20K/Sec > > The full source file contains 2.2G of triples in 10GB bz2 which > decompresses to 250GB nt, which I split into 10M triple chunks and used the > first one to test. > Which tdb loader? TDB2 For TDB1, the two loader behave very differently. I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8 hours (76K triples/s) using TDB1 tdbloader2. I'll write it up soon. Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS. Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest concurrent source the server has handled. Fans work! Check with Andy but I think it's limited by CPU, which is why my 24 core (4 > x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no > performance hit. > The limit at scale is the I/O handling and disk cache. 128G RAM gives a better disk cache and that server machine probably has better I/O. It's big enough to fit one whole index (if all RAM is available - and that depends on the swappiness setting which should be set to zero ideally). CPU is a limit for a while but you'll see the load speed slows down so it is not purely CPU as the limit. (As the indexes are 200-way trees, they don't get very deep.) tdbloader (loader1) does one index at a time so that the I/O is constrained, unlike simply adding triples to all 3 indexes together (which is what TDB2 loader does currently). loader1 degrades at large scale due to random I/O write patterns on secondary indexes. Hence an SSD makes a big difference. loader2 (which has high overhead) avoids the problems and only write indexes from sorted input so no random access to the indexes. An SSD makes less difference. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the > next few days and I will try and test against it. > > I haven't run the full import because a: i'm guessing the resulting TDB2 > will be "large" b: my servers are currently importing other "large" > TDB2's!!! > The TDB2 database for a single graph will be same size as TDB1 using tdbloader (not tdbloader2). > Long story follows... >
Re: tdb2.tdbloader performance
On 01/12/17 22:28, Laura Morales wrote: Thank you very much, this is great feedback! Your setup was very similar to mine, except: - I have 8GB RAM single bank, you have 16GB probably on two banks - my CPU is "half" of yours, 2 cores 4 threads despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader As already said - tdb2.tdbloader in its current form is not suitable for loading billion triple datasets (unless there is a lot of RAM ... I'd guess upward of 256G for truthy and a tuned server (swappy=0 for example), not that I've tried). runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand... With your xeon, you said you were able to get to 120K? Right? "concurrent 120K" I understood that to mean more than one load running at once. Dick's system has multiple TDB databases and a large disk cache. (I got 76K, single load, on somewhat less hardware so that suggests 120K may be affected by I/O contention.) What xeon, mobo, and RAM did you use? If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :) Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads. Andy keep us posted! Sent: Friday, December 01, 2017 at 9:11 PM From: "Dick Murray" <dandh...@gmail.com> To: users@jena.apache.org Subject: Re: tdb2.tdbloader performance Hi. Sorry for the delay :-) Short story I used the following "reasonable" device Dell M3800 Fedora 27 16GB SODIMM DDR3 Synchronous 1600 MHz CPU cache L1/256KB,L2/1MB,L3/6MB Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM disk and; @800% 60K/Sec @100% 40K/Sec @50% 20K/Sec The full source file contains 2.2G of triples in 10GB bz2 which decompresses to 250GB nt, which I split into 10M triple chunks and used the first one to test. Check with Andy but I think it's limited by CPU, which is why my 24 core (4 x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no performance hit. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the next few days and I will try and test against it. I haven't run the full import because a: i'm guessing the resulting TDB2 will be "large" b: my servers are currently importing other "large" TDB2's!!! Long story follows... decompress the file; pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2 Parallel BZIP2 v1.1.12 [Dec 21, 2014] By: Jeff Gilchrist [http://compression.ca] Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]] Uses libbzip2 by Julian Seward # CPUs: 4 Maximum Memory: 1024 MB Ignore Trailing Garbage: off --- File #: 1 of 1 Input Name: latest-truthy.nt.bz2 Output Name: latest-truthy.nt BWT Block Size: 900k Input Size: 9965955258 bytes Decompressing data... Output Size: 277563574685 bytes --- Wall Clock: 5871.550948 seconds count the lines; wc -l latest-truthy.nt 2199382887 latest-truthy.nt Just short of 2200M... split the file into 10M chunks; split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt. creating file 'latest-truthy.nt.000' creating file 'latest-truthy.nt.001' creating file 'latest-truthy.nt.002' creating file 'latest-truthy.nt.003' creating file 'latest-truthy.nt.004' creating file 'latest-truthy.nt.005' ... Restart! sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt ps aux | grep tdb2 root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java -Dlog4j.configuration=file:/run/
Re: tdb2.tdbloader performance
Short story I used the following "reasonable" device Dell M3800 Fedora 27 16GB SODIMM DDR3 Synchronous 1600 MHz CPU cache L1/256KB,L2/1MB,L3/6MB Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM disk and; @800%60K/Sec @100%40K/Sec @50%20K/Sec The full source file contains 2.2G of triples in 10GB bz2 which decompresses to 250GB nt, which I split into 10M triple chunks and used the first one to test. Which tdb loader? For TDB1, the two loader behave very differently. I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8 hours (76K triples/s) using TDB1 tdbloader2. I'll write it up soon. Check with Andy but I think it's limited by CPU, which is why my 24 core (4 x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no performance hit. The limit at scale is the I/O handling and disk cache. 128G RAM gives a better disk cache and that server machine probably has better I/O. It's big enough to fit one whole index (if all RAM is available - and that depends on the swappiness setting which should be set to zero ideally). CPU is a limit for a while but you'll see the load speed slows down so it is not purely CPU as the limit. (As the indexes are 200-way trees, they don't get very deep.) tdbloader (loader1) does one index at a time so that the I/O is constrained, unlike simply adding triples to all 3 indexes together (which is what TDB2 loader does currently). loader1 degrades at large scale due to random I/O write patterns on secondary indexes. Hence an SSD makes a big difference. loader2 (which has high overhead) avoids the problems and only write indexes from sorted input so no random access to the indexes. An SSD makes less difference. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the next few days and I will try and test against it. I haven't run the full import because a: i'm guessing the resulting TDB2 will be "large" b: my servers are currently importing other "large" TDB2's!!! The TDB2 database for a single graph will be same size as TDB1 using tdbloader (not tdbloader2). Long story follows...
Re: tdb2.tdbloader performance
Hi. Sorry for the delay :-) Short story I used the following "reasonable" device Dell M3800 Fedora 27 16GB SODIMM DDR3 Synchronous 1600 MHz CPU cache L1/256KB,L2/1MB,L3/6MB Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM disk and; @800%60K/Sec @100%40K/Sec @50%20K/Sec The full source file contains 2.2G of triples in 10GB bz2 which decompresses to 250GB nt, which I split into 10M triple chunks and used the first one to test. Check with Andy but I think it's limited by CPU, which is why my 24 core (4 x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no performance hit. I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the next few days and I will try and test against it. I haven't run the full import because a: i'm guessing the resulting TDB2 will be "large" b: my servers are currently importing other "large" TDB2's!!! Long story follows... decompress the file; pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2 Parallel BZIP2 v1.1.12 [Dec 21, 2014] By: Jeff Gilchrist [http://compression.ca] Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com] Uses libbzip2 by Julian Seward # CPUs: 4 Maximum Memory: 1024 MB Ignore Trailing Garbage: off --- File #: 1 of 1 Input Name: latest-truthy.nt.bz2 Output Name: latest-truthy.nt BWT Block Size: 900k Input Size: 9965955258 bytes Decompressing data... Output Size: 277563574685 bytes --- Wall Clock: 5871.550948 seconds count the lines; wc -l latest-truthy.nt 2199382887 latest-truthy.nt Just short of 2200M... split the file into 10M chunks; split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt. creating file 'latest-truthy.nt.000' creating file 'latest-truthy.nt.001' creating file 'latest-truthy.nt.002' creating file 'latest-truthy.nt.003' creating file 'latest-truthy.nt.004' creating file 'latest-truthy.nt.005' ... Restart! sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt ps aux | grep tdb2 root 3358 0.0 0.0 222844 5756 pts/0S+ 19:22 0:00 sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3359 0.0 0.0 4500 776 pts/0S+ 19:22 0:00 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3360 0.0 0.0 120304 3288 pts/0S+ 19:22 0:00 sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3361 4.9 0.0 450092 pts/0S<+ 19:22 0:05 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java -Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties -cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt dick 3477 0.0 0.0 119728 972 pts/1S+ 19:24 0:00 grep --color=auto tdb2 Notice PID 3366 is -Xmx2G default. 19:26:49 INFO TDB2 :: Finished: 10,485,760 latest-truthy.000.nt 247.28s (Avg: 42,404) After the first pass there is no read from the 1TB source as the OS has cached the 1.2G source. 19:33:50 INFO TDB2 :: Finished: 10,485,760 latest-truthy.000.nt 245.70s (Avg: 42,677) export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC sudo ps aux | grep tdb2 root 4317 0.0 0.0 222848 6236 pts/0S+ 19:35 0:00 sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 4321 0.0 0.0 4500 924 pts/0S+ 19:35 0:00 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 4322 0.0 0.0 120304 3356 pts/0S+ 19:35 0:00 sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 4323 4.8 0.0 450088 pts/0S<+ 19:35 0:09 cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java -Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties -cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v --loc /media/ramdisk/ latest-truthy.000.nt dick 4594 0.0 0.0 119728 1024 pts/1S+ 19:38 0:00 grep --color=auto tdb2 At 800K PID was 3GB and peaked at 3.4GB just prior to completion. 19:39:23 INFO TDB2 :: Finished: 10,485,760 latest-truthy.000.nt 247.65s (Avg: 42,340) Throw all CPU resources at it i.e. 800 sudo
Re: tdb2.tdbloader performance
> I've had loads take over 24 hours and produce 350GB TDB1 instances... Yeah 24H is still acceptable, but it's very borderline. Running a conversion that takes days becomes frustrating very soon. Of course I'm not trying to be mean here, but I think it's good to push the limits because we are already at a point where graphs have several billions triples. If my computer, which is an average consumer PC at best, can do 60-70K, two "average grade" nodes could already outperform your beefy server if only I could share the load on multiple PCs. > Ok with the data, I have that somewhere and will run it through, hopefully > tonight if paid work doesn't get in the way ;-) Thank you very much for trying this and for offering feedback. I'd be interested to know - what components do you have (cpu/ram/disks/...) - the AVG number of triples/second - the final size of the TDB2 store Also since you're already running this test, would you mind sharing the final TDB2 store instead of deleting it? :) If the output is not too large...
Re: tdb2.tdbloader performance
I've had loads take over 24 hours and produce 350GB TDB1 instances... You can run multiple loaders into separate instances and on sufficient kit they don't slow down. As a back ground I convert CAD files to triples or quads, typically 100M but some can be 500M. That's triples output not file input size. Ok with the data, I have that somewhere and will run it through, hopefully tonight if paid work doesn't get in the way ;-) Dick Original message From: Laura Morales <laure...@mail.com> Date: 28/11/2017 18:34 (GMT+00:00) To: users@jena.apache.org Cc: users@jena.apache.org Subject: Re: tdb2.tdbloader performance > I've achieved concurrent 120K on the server hardware but it depends on the input. Good to see that it can go faster. I do understand that this metric is dependent on input, but it still looks rather slow considering that datasets keep growing. At this (constant) rate, Wikidata would still take at least 12-13 hours. > What the server hardware does do is allow me to run multiple processes and > average 60K. tdb2.tdbloader is single threaded though, I don't know how multiple cores are going to help. > We tend towards running multiple TDB's and present them as one, a legacy of overcoming the one writer in TDB1. One graph per TDB store? > On the minefield subject of hardware, do you have DDR3 or DDR4? DDR3 1600MHz > What > chipset is driving it because Haswell’s dual-channel memory controller is > going to have a hard time keeping up with the quad-channel memory > controllers on Ivy Bridge-E and Haswell-E Haswell, dual-channel I think. > What files are you trying to import and i'll run them through? The 1.1GB that I mentioned contains data that I can't make public on the Internet, but you can try with the Wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz You probably don't have to convert all of it. Just starting the conversion you should already see how many triples it's handling. I ran this comman `./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`. If it goes any faster than 70K AVG triples/second, I'd be interested to know what hardware components you've got.
Re: tdb2.tdbloader performance
> I've achieved concurrent 120K on the server hardware but it depends on the input. Good to see that it can go faster. I do understand that this metric is dependent on input, but it still looks rather slow considering that datasets keep growing. At this (constant) rate, Wikidata would still take at least 12-13 hours. > What the server hardware does do is allow me to run multiple processes and > average 60K. tdb2.tdbloader is single threaded though, I don't know how multiple cores are going to help. > We tend towards running multiple TDB's and present them as one, a legacy of overcoming the one writer in TDB1. One graph per TDB store? > On the minefield subject of hardware, do you have DDR3 or DDR4? DDR3 1600MHz > What > chipset is driving it because Haswell’s dual-channel memory controller is > going to have a hard time keeping up with the quad-channel memory > controllers on Ivy Bridge-E and Haswell-E Haswell, dual-channel I think. > What files are you trying to import and i'll run them through? The 1.1GB that I mentioned contains data that I can't make public on the Internet, but you can try with the Wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz You probably don't have to convert all of it. Just starting the conversion you should already see how many triples it's handling. I ran this comman `./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`. If it goes any faster than 70K AVG triples/second, I'd be interested to know what hardware components you've got.
Re: tdb2.tdbloader performance
LOL, there's lots of things where I'd like to "move the problem elsewhere". I've achieved concurrent 120K on the server hardware but it depends on the input. There's another recent Jena thread regarding sizing and that's tied up with what's in the input. I see the same thing with loading data, some files fly others seem to drag and it's not just the size. What the server hardware does do is allow me to run multiple processes and average 60K. Also up to a certain size I have an overclocked AMD (4.5Ghz) and it will outperform everything until it hits its cache limit. We tend towards running multiple TDB's and present them as one, a legacy of overcoming the one writer in TDB1. This brings it's own issues such as distinct being high cost which we mitigate with a few tricks. On the minefield subject of hardware, do you have DDR3 or DDR4? What chipset is driving it because Haswell’s dual-channel memory controller is going to have a hard time keeping up with the quad-channel memory controllers on Ivy Bridge-E and Haswell-E. And yes Corsair quote 47GB/s for DDR4, but you still need to write that somewhere and a M.2 a PCI-E 2.0 x4 at 1.6GB/s is almost 3x the througput of SATAIII at 600MB/s, PCI-E 3.0 x4 is 3.9GB/s, plus you now have Optane or 3D XPoint depending on what sounds better What files are you trying to import and i'll run them through? Regards Dick On 28 November 2017 at 15:30, Laura Moraleswrote: > > Eventually something will give and you'll get a wait as something is > spilled to something, ie cache to physical drive. > > Also different settings suit different work loads. I have a number of > +128GB units configured differently depending on what they need to do. The > ETL setting only gives Java 8GB but the OS will consume close to 90GB > virtual for the process as it basically dumps into file cache. At some > point though that cache is written out to noon volatile storage. As the > units have 24 cores I can actually run close to 12 processes before things > start to effect each other. If you consider server class hardware there's a > lot of thought to cache levels and how they cascade. > > Switch the SATA for M.2 and you'll move the issue somewhere else... > > Well yeah, but having a problem at 10K triples/seconds is not the same > problem as 1M triples/seconds. I'll gladly "move the problem elsewhere" if > I knew how to get to 1M triples/seconds. > Moving from SATA to M.2 I don't know if it's worth the trouble (and money) > given that on my computer running from SATA3 disks or RAMdisk doesn't seem > like it's making any difference. And RAM is much faster than M.2 too. > Just out of curiosity, how many "AVG triples/seconds" can you get with > your server-class hardware when converting a .nt to TDB2 using > tdb2.tdbloader? >
Re: tdb2.tdbloader performance
> Eventually something will give and you'll get a wait as something is spilled > to something, ie cache to physical drive. > Also different settings suit different work loads. I have a number of +128GB > units configured differently depending on what they need to do. The ETL > setting only gives Java 8GB but the OS will consume close to 90GB virtual for > the process as it basically dumps into file cache. At some point though that > cache is written out to noon volatile storage. As the units have 24 cores I > can actually run close to 12 processes before things start to effect each > other. If you consider server class hardware there's a lot of thought to > cache levels and how they cascade. > Switch the SATA for M.2 and you'll move the issue somewhere else... Well yeah, but having a problem at 10K triples/seconds is not the same problem as 1M triples/seconds. I'll gladly "move the problem elsewhere" if I knew how to get to 1M triples/seconds. Moving from SATA to M.2 I don't know if it's worth the trouble (and money) given that on my computer running from SATA3 disks or RAMdisk doesn't seem like it's making any difference. And RAM is much faster than M.2 too. Just out of curiosity, how many "AVG triples/seconds" can you get with your server-class hardware when converting a .nt to TDB2 using tdb2.tdbloader?
Re: tdb2.tdbloader performance
Eventually something will give and you'll get a wait as something is spilled to something, ie cache to physical drive. Also different settings suit different work loads. I have a number of +128GB units configured differently depending on what they need to do. The ETL setting only gives Java 8GB but the OS will consume close to 90GB virtual for the process as it basically dumps into file cache. At some point though that cache is written out to noon volatile storage. As the units have 24 cores I can actually run close to 12 processes before things start to effect each other. If you consider server class hardware there's a lot of thought to cache levels and how they cascade. Switch the SATA for M.2 and you'll move the issue somewhere else... Dick Original message From: Laura Morales <laure...@mail.com> Date: 28/11/2017 14:06 (GMT+00:00) To: jena-users-ml <users@jena.apache.org> Subject: tdb2.tdbloader performance So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 HDD available. I decided to try the conversion again on a 1.1GB .nt file. I used `./tbd2.tdbloader --loc xxx --verbose file.nt`. Reading the .nt file from the HDD, and writing to the HDD gave me about 60K triples/second on AVG. I don't have an SSD, but this PC seems to have enough RAM. So I started a livecd to be sure that I was running everything from RAM; all disks unmounted. I ran the same command, and the AVG number of triples/seconds is pretty much the same, perhaps only slightly better with 2K or 3K more per seconds. Conversion from the livecd seemed to use a full thread at 100%, 25% RAM, 0 SWAP. This is... very surprising, I wasn't expecting this. I was expecting a significant improvement since I was running everything from RAM. What I get from this is that SATA3 disks are OK? That SSD won't really make any difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only ways to get more performance out of tdb2.tdbloader (since more RAM capacity doesn't seem to make any difference)? Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is slowing down conversion? Like for example using less RAM than it's available or whatever?
tdb2.tdbloader performance
So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 HDD available. I decided to try the conversion again on a 1.1GB .nt file. I used `./tbd2.tdbloader --loc xxx --verbose file.nt`. Reading the .nt file from the HDD, and writing to the HDD gave me about 60K triples/second on AVG. I don't have an SSD, but this PC seems to have enough RAM. So I started a livecd to be sure that I was running everything from RAM; all disks unmounted. I ran the same command, and the AVG number of triples/seconds is pretty much the same, perhaps only slightly better with 2K or 3K more per seconds. Conversion from the livecd seemed to use a full thread at 100%, 25% RAM, 0 SWAP. This is... very surprising, I wasn't expecting this. I was expecting a significant improvement since I was running everything from RAM. What I get from this is that SATA3 disks are OK? That SSD won't really make any difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only ways to get more performance out of tdb2.tdbloader (since more RAM capacity doesn't seem to make any difference)? Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is slowing down conversion? Like for example using less RAM than it's available or whatever?