thank you Rob - I have confused the terminology. Indeed each run processes 0.5m quads. what is UOM of the batch loading? M/s?
Looking at the output of iotop - the 3 main threads - which comprise the lionshare of the activity - have pretty steady reads - of about 900-950 M/s w little variation beyond that. write varies a little more - but 2-6 M/s. What is puzzling to me - is that in a configuration with twice the CPU and twice the RAM - the load average remained the same in absolute terms. RAM usage was steady at about 8GB in the 32GB configuration - now its been steady at 4.8 (16GB configuration) - in both cases it is severely underutilized. I'm not conversant in Java - I've been trying jstat and jmap - the latter hasnt been working. how best to verify the heap size? what else can affect things? On 2020/06/22 09:01:36, Rob Vesse <rve...@dotnetrdf.org> wrote: > Isabel > > I think there might be a fundamental misunderstanding happening about batch > sizes here. The batch sizes are fixed for a run and never changes, the > "batch size" you refer to is a speed calculation e.g > > 19:03:24 INFO loader :: Add: 248,000,000 github_1_fixed.nq (Batch: > 3,562 / Avg: 38,030) > 19:05:19 INFO loader :: Add: 248,500,000 github_1_fixed.nq (Batch: > 4,322 / Avg: 37,443) > 19:07:36 INFO loader :: Add: 249,000,000 github_1_fixed.nq (Batch: > 3,649 / Avg: 36,759) > 19:09:19 INFO loader :: Add: 249,500,000 github_1_fixed.nq (Batch: > 4,868 / Avg: 36,283) > 19:11:41 INFO loader :: Add: 250,000,000 github_1_fixed.nq (Batch: > 3,532 / Avg: 35,622) > 19:11:41 INFO loader :: Elapsed: 7,017.95 seconds [2020/06/21 > 19:11:41 UTC] > 19:13:58 INFO loader :: Add: 250,500,000 github_1_fixed.nq (Batch: > 3,643 / Avg: 35,009) > > Note that each batch is 500,000 quads as the number after "Add:" increases by > 500,000 each time. The "Batch" in brackets is referring to the calculated > loading speed for the current batch, where "Avg" is the calculated loading > speed over the entire load. > > So yes the speed is decreasing over time, this is a commonly reported issue > on these lists but there is no simple fix for this because it depends both on > hardware and data. The output you have provided suggests that you are stuck > on IO, Andy is the primary developer on TDB2 so he may be able to shed more > light on what might be going on. > > Rob > > On 22/06/2020, 06:32, "Isroel Kogan" <isroelko...@gmail.com> wrote: > > Hi Andy - thanks for your comments. > > Instead of responding point by point, its best if I present a clearer > picture -as I also have a better understanding of the factors so far. > > > GCP instance stats: > $ lscpu > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > Address sizes: 46 bits physical, 48 bits virtual > CPU(s): 10 > On-line CPU(s) list: 0-9 > Thread(s) per core: 2 > Core(s) per socket: 5 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 63 > Model name: Intel(R) Xeon(R) CPU @ 2.30GHz > Stepping: 0 > CPU MHz: 2300.000 > BogoMIPS: 4600.00 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 46080K > NUMA node0 CPU(s): 0-9 > > 16GB RAM > (I configured less RAM because on a prior iteration - out of 32GB - only > 8 was being used) > > 3TB local SSD > > according to google cloud - max performance for this size is as follows: > Storage space Partitions IOPS Throughput > (MB/s) > Read Write > Read Write > > 3 TB 8 680,000 360,000 2,650 > 1,400 > > > https://cloud.google.com/compute/docs/disks > > I'm not getting that - but performance is an order of magnitude or more > better than on my machine - as youll see below - > > > > My data is as follows -15 files of varying size (based on the original > bz2 files provided by the authors). files range from 2GB (a few million > triples) to 129GB. 4 files have close to a half a billion triples each. > The files are .nq - as you noted - quads. (unzipped as already mentioned). > > Currently I have a process running initiated by tdb2.tdbloader command in > parallel mode. DB empty initially. > > I will include some information here that may help you get a sense of the > performance issues I alluded to > > > beginning of log output: > > > 17:14:42 INFO loader :: Loader = LoaderParallel > 17:14:42 INFO loader :: Start: 15 files > 17:14:49 INFO loader :: Add: 500,000 ai_stackexchange_triples.nq > (Batch: 77,339 / Avg: 77,339) > 17:14:55 INFO loader :: Add: 1,000,000 ai_stackexchange_triples.nq > (Batch: 89,285 / Avg: 82,884) > 17:15:00 INFO loader :: Add: 1,500,000 ai_stackexchange_triples.nq > (Batch: 90,090 / Avg: 85,154) > 17:15:06 INFO loader :: Add: 2,000,000 ai_stackexchange_triples.nq > (Batch: 87,382 / Avg: 85,700) > 17:15:12 INFO loader :: Add: 2,500,000 ai_stackexchange_triples.nq > (Batch: 82,304 / Avg: 84,999) > 17:15:18 INFO loader :: Add: 3,000,000 ai_stackexchange_triples.nq > (Batch: 87,032 / Avg: 85,331) > 17:15:22 INFO loader :: End file: ai_stackexchange_triples.nq > (triples/quads = 3,402,753) > 17:15:23 INFO loader :: Add: 3,500,000 > datascience_stackexchange_triples.nq (Batch: 89,493 / Avg: 85,902) > 17:15:29 INFO loader :: Add: 4,000,000 > datascience_stackexchange_triples.nq (Batch: 86,475 / Avg: 85,973) > 17:15:35 INFO loader :: Add: 4,500,000 > datascience_stackexchange_triples.nq (Batch: 88,136 / Avg: 86,208) > 17:15:40 INFO loader :: Add: 5,000,000 > datascience_stackexchange_triples.nq (Batch: 93,562 / Avg: 86,891) > > > the batch size increases for the next file - but the third file is one of > those w nearly a half a billion triples. > > the batch size keeps steadily decreasing > > here is the output at the 250m triple mark > > 19:03:24 INFO loader :: Add: 248,000,000 github_1_fixed.nq (Batch: > 3,562 / Avg: 38,030) > 19:05:19 INFO loader :: Add: 248,500,000 github_1_fixed.nq (Batch: > 4,322 / Avg: 37,443) > 19:07:36 INFO loader :: Add: 249,000,000 github_1_fixed.nq (Batch: > 3,649 / Avg: 36,759) > 19:09:19 INFO loader :: Add: 249,500,000 github_1_fixed.nq (Batch: > 4,868 / Avg: 36,283) > 19:11:41 INFO loader :: Add: 250,000,000 github_1_fixed.nq (Batch: > 3,532 / Avg: 35,622) > 19:11:41 INFO loader :: Elapsed: 7,017.95 seconds [2020/06/21 > 19:11:41 UTC] > 19:13:58 INFO loader :: Add: 250,500,000 github_1_fixed.nq (Batch: > 3,643 / Avg: 35,009) > 19:16:04 INFO loader :: Add: 251,000,000 github_1_fixed.nq (Batch: > 3,976 / Avg: 34,473) > 19:18:41 INFO loader :: Add: 251,500,000 github_1_fixed.nq (Batch: > 3,171 / Avg: 33,810) > 19:21:03 INFO loader :: Add: 252,000,000 github_1_fixed.nq (Batch: > 3,512 / Avg: 33,241) > > The batch size has decreased (I don't understand exactly what the batch > size refers to) and conversely the time for a half million triples to be > processed has gone from 5s to 2.5m. > > This is the tail of the log currently: > > 03:01:16 INFO loader :: Add: 305,500,000 github_1_fixed.nq (Batch: > 1,468 / Avg: 8,680) > 03:06:47 INFO loader :: Add: 306,000,000 github_1_fixed.nq (Batch: > 1,512 / Avg: 8,613) > 03:11:10 INFO loader :: Add: 306,500,000 github_1_fixed.nq (Batch: > 1,897 / Avg: 8,564) > 03:16:00 INFO loader :: Add: 307,000,000 github_1_fixed.nq (Batch: > 1,723 / Avg: 8,509) > 03:21:36 INFO loader :: Add: 307,500,000 github_1_fixed.nq (Batch: > 1,489 / Avg: 8,444) > 03:26:07 INFO loader :: Add: 308,000,000 github_1_fixed.nq (Batch: > 1,842 / Avg: 8,395) > 03:31:43 INFO loader :: Add: 308,500,000 github_1_fixed.nq (Batch: > 1,491 / Avg: 8,333) > 03:36:08 INFO loader :: Add: 309,000,000 github_1_fixed.nq (Batch: > 1,882 / Avg: 8,287) > 03:39:37 INFO loader :: Add: 309,500,000 github_1_fixed.nq (Batch: > 2,391 / Avg: 8,254) > 03:44:02 INFO loader :: Add: 310,000,000 github_1_fixed.nq (Batch: > 1,887 / Avg: 8,209) > 03:44:02 INFO loader :: Elapsed: 37,759.63 seconds [2020/06/22 > 03:44:02 UTC] > 03:47:07 INFO loader :: Add: 310,500,000 github_1_fixed.nq (Batch: > 2,711 / Avg: 8,183) > 03:51:58 INFO loader :: Add: 311,000,000 github_1_fixed.nq (Batch: > 1,717 / Avg: 8,133) > > batch size down below 2k and processing time per 0.5m triples up to > around 5 min. > > What has puzzled me is that in the beginning, CPU utilization was high. > At some point the Load Average was over 7 and even 8 (so 70-80% for 10 cores). > Whereas now Load average sits near 3.5 not crossing 4 for several hours. > This is based on htop - according to the gcp console dashboard - CPU > utilization did indeed peak near 80% - but has been steady around 13-14% for > the last hours - I don't know how to match these two metrics - but its > heavily underutilized. > > Here's a flavor of the output (as I write now -when the batch size has > decreased) of > > $ iotop -ob > > > Current DISK READ: 2.76 G/s | Current DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 12184 be/4 isroelko 936.79 M/s 5.72 M/s 0.00 % 80.55 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12186 be/4 isroelko 961.04 M/s 7.28 M/s 0.00 % 78.52 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12185 be/4 isroelko 932.44 M/s 9.45 M/s 0.00 % 77.55 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > > > The relevant output of running (for 124s) - this is the last accumulation: > > $ iotop -ab > > Total DISK READ: 2.76 G/s | Total DISK WRITE: 16.87 M/s > Current DISK READ: 2.74 G/s | Current DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 12184 be/4 isroelko 88.11 G 414.80 M 0.00 % 79.31 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12185 be/4 isroelko 87.82 G 498.31 M 0.00 % 78.74 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12186 be/4 isroelko 88.17 G 435.88 M 0.00 % 78.66 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12187 be/4 isroelko 3.07 G 206.02 M 0.00 % 4.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12156 be/4 isroelko 68.29 M 8.00 K 0.00 % 0.63 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12171 be/4 isroelko 1048.00 K 0.00 B 0.00 % 0.13 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12160 be/4 isroelko 1016.00 K 0.00 B 0.00 % 0.10 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12157 be/4 isroelko 248.00 K 0.00 B 0.00 % 0.09 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12192 be/4 isroelko 388.00 K 0.00 B 0.00 % 0.09 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12162 be/4 isroelko 1704.00 K 8.00 K 0.00 % 0.06 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12181 be/4 isroelko 33.52 M 9.52 M 0.00 % 0.05 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12182 be/4 isroelko 33.29 M 9.69 M 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12177 be/4 isroelko 404.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12176 be/4 isroelko 140.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12191 be/4 isroelko 176.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12173 be/4 isroelko 48.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12175 be/4 isroelko 112.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12190 be/4 isroelko 44.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena > 12174 be/4 isroelko 248.00 K 0.00 B 0.00 % 0.04 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12183 be/4 isroelko 33.54 M 9.70 M 0.00 % 0.02 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12159 be/4 isroelko 84.00 K 0.00 B 0.00 % 0.00 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > 12170 be/4 isroelko 0.00 B 100.00 K 0.00 % 0.00 % java > -Dlog4j.configuration=file:/usr/local/apache-jena- > > > > and now the output of for pid 12155 - the subprocess running all the > threads > $ pidstat -p 12155 -ut > > > Linux 4.19.0-9-cloud-amd64 (graph4codev1) 06/22/20 _x86_64_ (10 > CPU) > > 04:38:32 UID TGID TID %usr %system %guest %wait > %CPU CPU Command > 04:38:32 1001 12155 - 44.41 47.13 0.00 0.00 > 91.54 4 java > 04:38:32 1001 - 12155 0.00 0.00 0.00 0.00 > 0.00 4 |__java > 04:38:32 1001 - 12156 4.88 0.11 0.00 0.00 > 4.99 7 |__java > 04:38:32 1001 - 12157 0.52 0.00 0.00 0.00 > 0.52 7 |__GC Thread#0 > 04:38:32 1001 - 12158 0.00 0.00 0.00 0.00 > 0.00 1 |__G1 Main Marker > 04:38:32 1001 - 12159 0.27 0.00 0.00 0.00 > 0.27 2 |__G1 Conc#0 > 04:38:32 1001 - 12160 0.34 0.00 0.00 0.01 > 0.34 8 |__G1 Refine#0 > 04:38:32 1001 - 12161 0.28 0.00 0.00 0.00 > 0.28 6 |__G1 Young RemSet > 04:38:32 1001 - 12162 0.02 0.01 0.00 0.00 > 0.03 1 |__VM Thread > 04:38:32 1001 - 12163 0.00 0.00 0.00 0.00 > 0.00 4 |__Reference Handl > 04:38:32 1001 - 12164 0.00 0.00 0.00 0.00 > 0.00 7 |__Finalizer > 04:38:32 1001 - 12165 0.00 0.00 0.00 0.00 > 0.00 6 |__Signal Dispatch > 04:38:32 1001 - 12166 0.04 0.00 0.00 0.00 > 0.04 1 |__C2 CompilerThre > 04:38:32 1001 - 12167 0.00 0.00 0.00 0.00 > 0.00 1 |__C1 CompilerThre > 04:38:32 1001 - 12168 0.00 0.00 0.00 0.00 > 0.00 1 |__Sweeper thread > 04:38:32 1001 - 12169 0.00 0.00 0.00 0.00 > 0.00 1 |__Service Thread > 04:38:32 1001 - 12170 0.03 0.02 0.00 0.01 > 0.05 1 |__VM Periodic Tas > 04:38:32 1001 - 12171 0.00 0.00 0.00 0.00 > 0.00 6 |__Common-Cleaner > 04:38:32 1001 - 12173 0.51 0.00 0.00 0.00 > 0.51 8 |__GC Thread#1 > 04:38:32 1001 - 12174 0.51 0.00 0.00 0.00 > 0.51 3 |__GC Thread#2 > 04:38:32 1001 - 12175 0.50 0.00 0.00 0.00 > 0.50 1 |__GC Thread#3 > 04:38:32 1001 - 12176 0.51 0.00 0.00 0.00 > 0.52 5 |__GC Thread#4 > 04:38:32 1001 - 12177 0.51 0.00 0.00 0.00 > 0.51 2 |__GC Thread#5 > 04:38:32 1001 - 12178 0.00 0.00 0.00 0.00 > 0.00 5 |__Thread-0 > 04:38:32 1001 - 12179 0.00 0.00 0.00 0.00 > 0.00 6 |__Thread-1 > 04:38:32 1001 - 12180 0.00 0.00 0.00 0.00 > 0.00 7 |__Thread-2 > 04:38:32 1001 - 12181 4.33 0.09 0.00 0.00 > 4.42 2 |__Thread-3 > 04:38:32 1001 - 12182 5.37 0.17 0.00 0.00 > 5.54 3 |__Thread-4 > 04:38:32 1001 - 12183 5.12 0.17 0.00 0.00 > 5.29 4 |__Thread-5 > 04:38:32 1001 - 12184 4.44 14.90 0.00 0.13 > 19.34 3 |__Thread-6 > 04:38:32 1001 - 12185 5.53 15.30 0.00 0.14 > 20.83 2 |__Thread-7 > 04:38:32 1001 - 12186 5.08 15.00 0.00 0.14 > 20.07 9 |__Thread-8 > 04:38:32 1001 - 12187 3.57 1.51 0.00 0.02 > 5.08 0 |__Thread-9 > 04:38:32 1001 - 12190 0.52 0.00 0.00 0.00 > 0.52 6 |__GC Thread#6 > 04:38:32 1001 - 12191 0.51 0.00 0.00 0.00 > 0.51 0 |__GC Thread#7 > 04:38:32 1001 - 12192 0.50 0.00 0.00 0.00 > 0.51 4 |__GC Thread#8 > 04:38:32 1001 - 12193 0.02 0.00 0.00 0.00 > 0.02 5 |__G1 Refine#1 > 04:38:32 1001 - 12194 0.01 0.00 0.00 0.00 > 0.01 9 |__G1 Refine#2 > 04:38:32 1001 - 12195 0.01 0.00 0.00 0.00 > 0.01 0 |__G1 Refine#3 > 04:38:32 1001 - 12196 0.01 0.00 0.00 0.00 > 0.01 7 |__G1 Refine#4 > 04:38:32 1001 - 12197 0.00 0.00 0.00 0.00 > 0.00 1 |__G1 Refine#5 > 04:38:32 1001 - 12198 0.00 0.00 0.00 0.00 > 0.00 1 |__G1 Refine#6 > 04:38:32 1001 - 12199 0.00 0.00 0.00 0.00 > 0.00 6 |__G1 Refine#7 > 04:38:32 1001 - 12200 0.00 0.00 0.00 0.00 > 0.00 6 |__G1 Refine#8 > 04:38:32 1001 - 12205 0.27 0.00 0.00 0.00 > 0.27 9 |__G1 Conc#1 > > > you can see threads, 12184, 12185, 12186 have the highest CPU > utilization, albeit still at 20% only. they are the same threads in the > output of iotop - doing a lot of reading mostly. As you can see in iotop > output, the read is at over 900 M/s - which is excellent. All the other > threads are idle the overwhelming majority of the time > > Finally htop gives a similar picture, the 3 threads have the highest CPU > utilization (around 20%) now, and so 7/10 cores are practically idle with > short spikes when the other threads pick up. which isnt much as the above > shows. Also those threads are either in R or D status (rarely in S) - while > the rest spend most of the time in S. > > I didnt understand what you wrote about the heap size, RAM and read > threads - I don't use java - so I'm flying by the seat of my pants. > Here is the output of > > $ jstat -gc 12155 S0C S1C S0U S1U EC EU OC > OU MC MU CCSC CCSU YGC YGCT FGC FGCT CGC > CGCT GCT > 0.0 184320.0 0.0 184320.0 2404352.0 102400.0 1519616.0 642119.0 > 20224.0 19386.6 2560.0 2339.3 2974 1039.542 0 0.000 340 2.989 > 1042.531 > > which may be relevant. > > My very crudely formulated question is - how come it seems that with > smaller batch sizes, the processing time increased so much - while at the > same time there is actually much less processing going on (much lower CPU > utilization) - it seems like everything is waiting for whatever those 3 > threads are doing. > And is three anyway to address it? > > thanks for your help! > > best, > > Isroel > > On 2020/06/21 21:38:50, Andy Seaborne <a...@apache.org> wrote: > > > And when it says "nq" is really quads or all data for the default > > > graph? (there is more indexing work for named graphs). > > > > >> : Quads = 127,206,280 > > > > > > OK - it's quads. There are 6 quad indexes and in full parallel mode it > > will use 2 more threads to parse and to build the node table. > > > > Full parallel loading is going to use up all the cores and HT threads > > aren't full threads for this purpose. > > > > The phased loader (default) uses less threads. > > > > Phase 1: > > one thread to decompress and parse > > one thread to build the node table. > > one thread to for the GSPO > > (and one for SPO but you seem to have no triples) > > =3 > > > > Phase 2: > > two threads > > =2 > > > > Phase 3: > > three threads > > =3 > > > > Andy > > > > > > On 21/06/2020 22:11, Andy Seaborne wrote: > > > Hi there, > > > > > > Thanks for reporting the findings. > > > > > > On 20/06/2020 16:10, Isroel Kogan wrote: > > >> Hi, > > >> > > >> I am also a newcomer to the RDF world - and particularly Jena, which > I > > >> started using this week. > > >> > > >> A couple of observations I have made over the last few days > exploring > > >> different options. > > >> > > >> Local Machine (specs): > > >> > > >> Ubuntu 18.04 > > >> Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 CPU) > > > > > > which is 4 core and hyper threading. for this workload that is more > like > > > 4 threads. HT is not a completely x2 for this sort of continuous > > > processing threading. > > > > > > And pre-emtptive timeslicing is not nice! > > > > > >> 16GB RAM > > >> 512 SSD (NVMe). > > >> > > >> > > >> the following compares loading a file in compressed vs decompressed > > >> format -both w parallel loader. > > >> > > >> file: > > >> docstrings_triples.nq > > >> size: 28GB > > >> > > >> cmd: > > >> time tdb2.tdbloader --loader=parallel --loc=test1graphdb > > >> docstrings_triples.nq > tdb2.log1 2>&1 > > >> > > >> :: Time = 1,364.310 seconds : Quads = 127,206,280 : Rate = 93,239 /s > > >> > > >> real 22m46.346s > > >> user 120m46.591s > > >> sys 3m22.698s > > >> > > >> > > >> file: > > >> docstrings_triples.nq.bz2 > > >> size: 542M > > >> > > >> cmd: > > >> > > >> time tdb2.tdbloader --loader=parallel --loc=test2graphdb > > >> docstrings_triples.nq.bz2 > tdb2.log2 2>&1 > > >> > > >> :: Time = 2,225.871 seconds : Quads = 127,206,280 : Rate = 57,149 /s > > >> > > >> > > >> real 37m8.182s > > >> user 109m42.970s > > >> sys 6m27.426s > > >> > > >> resulting DB size > > >> 30GB > > >> > > >> confirmed equal via diff. > > >> > > >> pbzip2 ran in 84s > > >> > > >> Less rigorously I noticed a similar gain in speed for other files. > > > > > > For gz files, the speed of loading of compressed vs uncompressed is > > > usually not very much. It does look like bz2 > > > > > > Using a separate process and faster decompressor may help: > > > > > > bzip2 -d < docstrings_triples.nq.bz2 | \ > > > time tdb2.tdbloader --loader=parallel --loc=test2graphdb \ > > > -- - > tdb2.log2 2>&1 > > > > > > When Jena decompresses a bz2 file, it uses a Apache Common Compress > so > > > it is a java decompressor which will take time to get optimized by > the > > > JIT and is likely slower than a specialized tool like bzip2. > > > > > > But with 4 core, it may have the opposite effect - using more > processes > > > causes preemption timeslicing. > > > > > > It maybe one of the other loaders is faster because it is a better > match > > > to the hardware. > > > > > >> Is this expected behaviour? What factors influence this? > > >> > > >> SSD - local vs cloud. > > >> > > >> on my local machine, when running parallel loader, cores were > working > > >> at over 70% capacity and there was little IO induced down time. > > > > > > How many core were active? > > > And when it says "nq" is really quads or all data for the default > graph? > > > (there is more indexing work for named graphs). > > > > > > Some of that will be the bz2 decompression but it looks to me "like > it's > > > "more threads than cores" causing timeslicing. > > > > > >> > > >> GCP instance specs: > > >> > > >> 20 CPU > > >> 32GB RAM > > > > > > And same heap size? > > > > > > While the parallel loader is using multiple threads it is a fixed > number > > > so more CPU will help only if > > > > > > More RAM is going to help because the OS will use it for file system > > > cache, delaying writes. > > > > > > But with more read threads, it could be there is less preemptive > > > scheduling and that could be a big gain. > > > > > >> 6TB "local SSD" storage > > >> the local SSD storage offers the best performance to reduce IO > latency > > >> - it has physical proximity to instance - as per GCP. > > >> > > >> a few cores were working at near capacity, while the vast majority > > >> idle (near 0%) w occasional spikes. average load translates to 20% > > >> utilization. As I've seen others write here, this is a difference > > >> others have noted. > > >> How can this be addressed? buffer size? (I don't have a deep enough > > >> understanding). > > > > > > My guess is that on the GCP instance it is one thread-one core. > > > > > >> > > >> > > >> Another recurring pattern is the reduction in batch size. > > >> I've been running a load job on my gcp instance for almost a day > (23+h). > > >> > > >> file size: 93GB > > >> triples: 472m > > >> > > >> batch size decreased from 160k range to under 1k, while processing > > >> time per batch increased from a few seconds to over 10 min. All this > > >> time average CPU usage has remained steady, as has RAM usage. > > > > > > Not sure I quite understand - this is adding more data to an existing > > > database? And 10mins for 1k? While it will be slower, that does sound > > > rather extreme. > > > > > >> > > >> I don't understand how all of this works with indexing. Is this > > >> expected behaviour? besides a locally proximate SSD, I've thrown an > > >> overkill of hardware at it. > > >> > > >> thanks > > >> > > > > > > Andy > > > > > > > > > > >