Isabel
I think there might be a fundamental misunderstanding happening about batch sizes here.
The batch sizes are fixed for a run and never changes, the "batch size" you
refer to is a speed calculation e.g
19:03:24 INFO loader :: Add: 248,000,000 github_1_fixed.nq (Batch:
3,562 / Avg: 38,030)
19:05:19 INFO loader :: Add: 248,500,000 github_1_fixed.nq (Batch:
4,322 / Avg: 37,443)
19:07:36 INFO loader :: Add: 249,000,000 github_1_fixed.nq (Batch:
3,649 / Avg: 36,759)
19:09:19 INFO loader :: Add: 249,500,000 github_1_fixed.nq (Batch:
4,868 / Avg: 36,283)
19:11:41 INFO loader :: Add: 250,000,000 github_1_fixed.nq (Batch:
3,532 / Avg: 35,622)
19:11:41 INFO loader :: Elapsed: 7,017.95 seconds [2020/06/21
19:11:41 UTC]
19:13:58 INFO loader :: Add: 250,500,000 github_1_fixed.nq (Batch:
3,643 / Avg: 35,009)
Note that each batch is 500,000 quads as the number after "Add:" increases by 500,000 each time.
The "Batch" in brackets is referring to the calculated loading speed for the current batch, where
"Avg" is the calculated loading speed over the entire load.
So yes the speed is decreasing over time, this is a commonly reported issue on
these lists but there is no simple fix for this because it depends both on
hardware and data. The output you have provided suggests that you are stuck on
IO, Andy is the primary developer on TDB2 so he may be able to shed more light
on what might be going on.
Rob
On 22/06/2020, 06:32, "Isroel Kogan" <isroelko...@gmail.com> wrote:
Hi Andy - thanks for your comments.
Instead of responding point by point, its best if I present a clearer picture -as I also have a better understanding of the factors so far.
GCP instance stats:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 10
On-line CPU(s) list: 0-9
Thread(s) per core: 2
Core(s) per socket: 5
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2300.000
BogoMIPS: 4600.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-9
16GB RAM
(I configured less RAM because on a prior iteration - out of 32GB - only 8
was being used)
3TB local SSD
according to google cloud - max performance for this size is as follows:
Storage space Partitions IOPS Throughput
(MB/s)
Read Write
Read Write
3 TB 8 680,000 360,000 2,650 1,400
https://cloud.google.com/compute/docs/disks
I'm not getting that - but performance is an order of magnitude or more better than on my machine - as youll see below -
My data is as follows -15 files of varying size (based on the original bz2 files provided by the authors). files range from 2GB (a few million triples) to 129GB. 4 files have close to a half a billion triples each.
The files are .nq - as you noted - quads. (unzipped as already mentioned).
Currently I have a process running initiated by tdb2.tdbloader command in parallel mode. DB empty initially.
I will include some information here that may help you get a sense of the performance issues I alluded to
beginning of log output:
17:14:42 INFO loader :: Loader = LoaderParallel
17:14:42 INFO loader :: Start: 15 files
17:14:49 INFO loader :: Add: 500,000 ai_stackexchange_triples.nq
(Batch: 77,339 / Avg: 77,339)
17:14:55 INFO loader :: Add: 1,000,000 ai_stackexchange_triples.nq
(Batch: 89,285 / Avg: 82,884)
17:15:00 INFO loader :: Add: 1,500,000 ai_stackexchange_triples.nq
(Batch: 90,090 / Avg: 85,154)
17:15:06 INFO loader :: Add: 2,000,000 ai_stackexchange_triples.nq
(Batch: 87,382 / Avg: 85,700)
17:15:12 INFO loader :: Add: 2,500,000 ai_stackexchange_triples.nq
(Batch: 82,304 / Avg: 84,999)
17:15:18 INFO loader :: Add: 3,000,000 ai_stackexchange_triples.nq
(Batch: 87,032 / Avg: 85,331)
17:15:22 INFO loader :: End file: ai_stackexchange_triples.nq
(triples/quads = 3,402,753)
17:15:23 INFO loader :: Add: 3,500,000
datascience_stackexchange_triples.nq (Batch: 89,493 / Avg: 85,902)
17:15:29 INFO loader :: Add: 4,000,000
datascience_stackexchange_triples.nq (Batch: 86,475 / Avg: 85,973)
17:15:35 INFO loader :: Add: 4,500,000
datascience_stackexchange_triples.nq (Batch: 88,136 / Avg: 86,208)
17:15:40 INFO loader :: Add: 5,000,000
datascience_stackexchange_triples.nq (Batch: 93,562 / Avg: 86,891)
the batch size increases for the next file - but the third file is one of those w nearly a half a billion triples.
the batch size keeps steadily decreasing
here is the output at the 250m triple mark
19:03:24 INFO loader :: Add: 248,000,000 github_1_fixed.nq (Batch: 3,562 / Avg: 38,030)
19:05:19 INFO loader :: Add: 248,500,000 github_1_fixed.nq (Batch:
4,322 / Avg: 37,443)
19:07:36 INFO loader :: Add: 249,000,000 github_1_fixed.nq (Batch:
3,649 / Avg: 36,759)
19:09:19 INFO loader :: Add: 249,500,000 github_1_fixed.nq (Batch:
4,868 / Avg: 36,283)
19:11:41 INFO loader :: Add: 250,000,000 github_1_fixed.nq (Batch:
3,532 / Avg: 35,622)
19:11:41 INFO loader :: Elapsed: 7,017.95 seconds [2020/06/21
19:11:41 UTC]
19:13:58 INFO loader :: Add: 250,500,000 github_1_fixed.nq (Batch:
3,643 / Avg: 35,009)
19:16:04 INFO loader :: Add: 251,000,000 github_1_fixed.nq (Batch:
3,976 / Avg: 34,473)
19:18:41 INFO loader :: Add: 251,500,000 github_1_fixed.nq (Batch:
3,171 / Avg: 33,810)
19:21:03 INFO loader :: Add: 252,000,000 github_1_fixed.nq (Batch:
3,512 / Avg: 33,241)
The batch size has decreased (I don't understand exactly what the batch size refers to) and conversely the time for a half million triples to be processed has gone from 5s to 2.5m.
This is the tail of the log currently:
03:01:16 INFO loader :: Add: 305,500,000 github_1_fixed.nq (Batch: 1,468 / Avg: 8,680)
03:06:47 INFO loader :: Add: 306,000,000 github_1_fixed.nq (Batch:
1,512 / Avg: 8,613)
03:11:10 INFO loader :: Add: 306,500,000 github_1_fixed.nq (Batch:
1,897 / Avg: 8,564)
03:16:00 INFO loader :: Add: 307,000,000 github_1_fixed.nq (Batch:
1,723 / Avg: 8,509)
03:21:36 INFO loader :: Add: 307,500,000 github_1_fixed.nq (Batch:
1,489 / Avg: 8,444)
03:26:07 INFO loader :: Add: 308,000,000 github_1_fixed.nq (Batch:
1,842 / Avg: 8,395)
03:31:43 INFO loader :: Add: 308,500,000 github_1_fixed.nq (Batch:
1,491 / Avg: 8,333)
03:36:08 INFO loader :: Add: 309,000,000 github_1_fixed.nq (Batch:
1,882 / Avg: 8,287)
03:39:37 INFO loader :: Add: 309,500,000 github_1_fixed.nq (Batch:
2,391 / Avg: 8,254)
03:44:02 INFO loader :: Add: 310,000,000 github_1_fixed.nq (Batch:
1,887 / Avg: 8,209)
03:44:02 INFO loader :: Elapsed: 37,759.63 seconds [2020/06/22
03:44:02 UTC]
03:47:07 INFO loader :: Add: 310,500,000 github_1_fixed.nq (Batch:
2,711 / Avg: 8,183)
03:51:58 INFO loader :: Add: 311,000,000 github_1_fixed.nq (Batch:
1,717 / Avg: 8,133)
batch size down below 2k and processing time per 0.5m triples up to around 5 min.
What has puzzled me is that in the beginning, CPU utilization was high. At some point the Load Average was over 7 and even 8 (so 70-80% for 10 cores).
Whereas now Load average sits near 3.5 not crossing 4 for several hours.
This is based on htop - according to the gcp console dashboard - CPU
utilization did indeed peak near 80% - but has been steady around 13-14% for
the last hours - I don't know how to match these two metrics - but its heavily
underutilized.
Here's a flavor of the output (as I write now -when the batch size has decreased) of
$ iotop -ob
Current DISK READ: 2.76 G/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
12184 be/4 isroelko 936.79 M/s 5.72 M/s 0.00 % 80.55 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12186 be/4 isroelko 961.04 M/s 7.28 M/s 0.00 % 78.52 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12185 be/4 isroelko 932.44 M/s 9.45 M/s 0.00 % 77.55 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
The relevant output of running (for 124s) - this is the last accumulation:
$ iotop -ab
Total DISK READ: 2.76 G/s | Total DISK WRITE: 16.87 M/s
Current DISK READ: 2.74 G/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
12184 be/4 isroelko 88.11 G 414.80 M 0.00 % 79.31 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12185 be/4 isroelko 87.82 G 498.31 M 0.00 % 78.74 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12186 be/4 isroelko 88.17 G 435.88 M 0.00 % 78.66 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12187 be/4 isroelko 3.07 G 206.02 M 0.00 % 4.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12156 be/4 isroelko 68.29 M 8.00 K 0.00 % 0.63 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12171 be/4 isroelko 1048.00 K 0.00 B 0.00 % 0.13 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12160 be/4 isroelko 1016.00 K 0.00 B 0.00 % 0.10 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12157 be/4 isroelko 248.00 K 0.00 B 0.00 % 0.09 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12192 be/4 isroelko 388.00 K 0.00 B 0.00 % 0.09 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12162 be/4 isroelko 1704.00 K 8.00 K 0.00 % 0.06 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12181 be/4 isroelko 33.52 M 9.52 M 0.00 % 0.05 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12182 be/4 isroelko 33.29 M 9.69 M 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12177 be/4 isroelko 404.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12176 be/4 isroelko 140.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12191 be/4 isroelko 176.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12173 be/4 isroelko 48.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12175 be/4 isroelko 112.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12190 be/4 isroelko 44.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena
12174 be/4 isroelko 248.00 K 0.00 B 0.00 % 0.04 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12183 be/4 isroelko 33.54 M 9.70 M 0.00 % 0.02 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12159 be/4 isroelko 84.00 K 0.00 B 0.00 % 0.00 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
12170 be/4 isroelko 0.00 B 100.00 K 0.00 % 0.00 % java
-Dlog4j.configuration=file:/usr/local/apache-jena-
and now the output of for pid 12155 - the subprocess running all the threads
$ pidstat -p 12155 -ut
Linux 4.19.0-9-cloud-amd64 (graph4codev1) 06/22/20 _x86_64_ (10 CPU)
04:38:32 UID TGID TID %usr %system %guest %wait %CPU CPU Command
04:38:32 1001 12155 - 44.41 47.13 0.00 0.00
91.54 4 java
04:38:32 1001 - 12155 0.00 0.00 0.00 0.00
0.00 4 |__java
04:38:32 1001 - 12156 4.88 0.11 0.00 0.00
4.99 7 |__java
04:38:32 1001 - 12157 0.52 0.00 0.00 0.00
0.52 7 |__GC Thread#0
04:38:32 1001 - 12158 0.00 0.00 0.00 0.00
0.00 1 |__G1 Main Marker
04:38:32 1001 - 12159 0.27 0.00 0.00 0.00
0.27 2 |__G1 Conc#0
04:38:32 1001 - 12160 0.34 0.00 0.00 0.01
0.34 8 |__G1 Refine#0
04:38:32 1001 - 12161 0.28 0.00 0.00 0.00
0.28 6 |__G1 Young RemSet
04:38:32 1001 - 12162 0.02 0.01 0.00 0.00
0.03 1 |__VM Thread
04:38:32 1001 - 12163 0.00 0.00 0.00 0.00
0.00 4 |__Reference Handl
04:38:32 1001 - 12164 0.00 0.00 0.00 0.00
0.00 7 |__Finalizer
04:38:32 1001 - 12165 0.00 0.00 0.00 0.00
0.00 6 |__Signal Dispatch
04:38:32 1001 - 12166 0.04 0.00 0.00 0.00
0.04 1 |__C2 CompilerThre
04:38:32 1001 - 12167 0.00 0.00 0.00 0.00
0.00 1 |__C1 CompilerThre
04:38:32 1001 - 12168 0.00 0.00 0.00 0.00
0.00 1 |__Sweeper thread
04:38:32 1001 - 12169 0.00 0.00 0.00 0.00
0.00 1 |__Service Thread
04:38:32 1001 - 12170 0.03 0.02 0.00 0.01
0.05 1 |__VM Periodic Tas
04:38:32 1001 - 12171 0.00 0.00 0.00 0.00
0.00 6 |__Common-Cleaner
04:38:32 1001 - 12173 0.51 0.00 0.00 0.00
0.51 8 |__GC Thread#1
04:38:32 1001 - 12174 0.51 0.00 0.00 0.00
0.51 3 |__GC Thread#2
04:38:32 1001 - 12175 0.50 0.00 0.00 0.00
0.50 1 |__GC Thread#3
04:38:32 1001 - 12176 0.51 0.00 0.00 0.00
0.52 5 |__GC Thread#4
04:38:32 1001 - 12177 0.51 0.00 0.00 0.00
0.51 2 |__GC Thread#5
04:38:32 1001 - 12178 0.00 0.00 0.00 0.00
0.00 5 |__Thread-0
04:38:32 1001 - 12179 0.00 0.00 0.00 0.00
0.00 6 |__Thread-1
04:38:32 1001 - 12180 0.00 0.00 0.00 0.00
0.00 7 |__Thread-2
04:38:32 1001 - 12181 4.33 0.09 0.00 0.00
4.42 2 |__Thread-3
04:38:32 1001 - 12182 5.37 0.17 0.00 0.00
5.54 3 |__Thread-4
04:38:32 1001 - 12183 5.12 0.17 0.00 0.00
5.29 4 |__Thread-5
04:38:32 1001 - 12184 4.44 14.90 0.00 0.13
19.34 3 |__Thread-6
04:38:32 1001 - 12185 5.53 15.30 0.00 0.14
20.83 2 |__Thread-7
04:38:32 1001 - 12186 5.08 15.00 0.00 0.14
20.07 9 |__Thread-8
04:38:32 1001 - 12187 3.57 1.51 0.00 0.02
5.08 0 |__Thread-9
04:38:32 1001 - 12190 0.52 0.00 0.00 0.00
0.52 6 |__GC Thread#6
04:38:32 1001 - 12191 0.51 0.00 0.00 0.00
0.51 0 |__GC Thread#7
04:38:32 1001 - 12192 0.50 0.00 0.00 0.00
0.51 4 |__GC Thread#8
04:38:32 1001 - 12193 0.02 0.00 0.00 0.00
0.02 5 |__G1 Refine#1
04:38:32 1001 - 12194 0.01 0.00 0.00 0.00
0.01 9 |__G1 Refine#2
04:38:32 1001 - 12195 0.01 0.00 0.00 0.00
0.01 0 |__G1 Refine#3
04:38:32 1001 - 12196 0.01 0.00 0.00 0.00
0.01 7 |__G1 Refine#4
04:38:32 1001 - 12197 0.00 0.00 0.00 0.00
0.00 1 |__G1 Refine#5
04:38:32 1001 - 12198 0.00 0.00 0.00 0.00
0.00 1 |__G1 Refine#6
04:38:32 1001 - 12199 0.00 0.00 0.00 0.00
0.00 6 |__G1 Refine#7
04:38:32 1001 - 12200 0.00 0.00 0.00 0.00
0.00 6 |__G1 Refine#8
04:38:32 1001 - 12205 0.27 0.00 0.00 0.00
0.27 9 |__G1 Conc#1
you can see threads, 12184, 12185, 12186 have the highest CPU utilization, albeit still at 20% only. they are the same threads in the output of iotop - doing a lot of reading mostly. As you can see in iotop output, the read is at over 900 M/s - which is excellent. All the other threads are idle the overwhelming majority of the time
Finally htop gives a similar picture, the 3 threads have the highest CPU utilization (around 20%) now, and so 7/10 cores are practically idle with short spikes when the other threads pick up. which isnt much as the above shows. Also those threads are either in R or D status (rarely in S) - while the rest spend most of the time in S.
I didnt understand what you wrote about the heap size, RAM and read threads - I don't use java - so I'm flying by the seat of my pants.
Here is the output of
$ jstat -gc 12155 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT CGC CGCT GCT
0.0 184320.0 0.0 184320.0 2404352.0 102400.0 1519616.0 642119.0
20224.0 19386.6 2560.0 2339.3 2974 1039.542 0 0.000 340 2.989
1042.531
which may be relevant.
My very crudely formulated question is - how come it seems that with smaller batch sizes, the processing time increased so much - while at the same time there is actually much less processing going on (much lower CPU utilization) - it seems like everything is waiting for whatever those 3 threads are doing.
And is three anyway to address it?
thanks for your help!
best,
Isroel
On 2020/06/21 21:38:50, Andy Seaborne <a...@apache.org> wrote:
> > And when it says "nq" is really quads or all data for the default
> > graph? (there is more indexing work for named graphs).
>
> >> : Quads = 127,206,280
>
>
> OK - it's quads. There are 6 quad indexes and in full parallel mode it
> will use 2 more threads to parse and to build the node table.
>
> Full parallel loading is going to use up all the cores and HT threads
> aren't full threads for this purpose.
>
> The phased loader (default) uses less threads.
>
> Phase 1:
> one thread to decompress and parse
> one thread to build the node table.
> one thread to for the GSPO
> (and one for SPO but you seem to have no triples)
> =3
>
> Phase 2:
> two threads
> =2
>
> Phase 3:
> three threads
> =3
>
> Andy
>
>
> On 21/06/2020 22:11, Andy Seaborne wrote:
> > Hi there,
> >
> > Thanks for reporting the findings.
> >
> > On 20/06/2020 16:10, Isroel Kogan wrote:
> >> Hi,
> >>
> >> I am also a newcomer to the RDF world - and particularly Jena, which I
> >> started using this week.
> >>
> >> A couple of observations I have made over the last few days exploring
> >> different options.
> >>
> >> Local Machine (specs):
> >>
> >> Ubuntu 18.04
> >> Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 CPU)
> >
> > which is 4 core and hyper threading. for this workload that is more
like
> > 4 threads. HT is not a completely x2 for this sort of continuous
> > processing threading.
> >
> > And pre-emtptive timeslicing is not nice!
> >
> >> 16GB RAM
> >> 512 SSD (NVMe).
> >>
> >>
> >> the following compares loading a file in compressed vs decompressed
> >> format -both w parallel loader.
> >>
> >> file:
> >> docstrings_triples.nq
> >> size: 28GB
> >>
> >> cmd:
> >> time tdb2.tdbloader --loader=parallel --loc=test1graphdb
> >> docstrings_triples.nq > tdb2.log1 2>&1
> >>
> >> :: Time = 1,364.310 seconds : Quads = 127,206,280 : Rate = 93,239 /s
> >>
> >> real 22m46.346s
> >> user 120m46.591s
> >> sys 3m22.698s
> >>
> >>
> >> file:
> >> docstrings_triples.nq.bz2
> >> size: 542M
> >>
> >> cmd:
> >>
> >> time tdb2.tdbloader --loader=parallel --loc=test2graphdb
> >> docstrings_triples.nq.bz2 > tdb2.log2 2>&1
> >>
> >> :: Time = 2,225.871 seconds : Quads = 127,206,280 : Rate = 57,149 /s
> >>
> >>
> >> real 37m8.182s
> >> user 109m42.970s
> >> sys 6m27.426s
> >>
> >> resulting DB size
> >> 30GB
> >>
> >> confirmed equal via diff.
> >>
> >> pbzip2 ran in 84s
> >>
> >> Less rigorously I noticed a similar gain in speed for other files.
> >
> > For gz files, the speed of loading of compressed vs uncompressed is
> > usually not very much. It does look like bz2
> >
> > Using a separate process and faster decompressor may help:
> >
> > bzip2 -d < docstrings_triples.nq.bz2 | \
> > time tdb2.tdbloader --loader=parallel --loc=test2graphdb \
> > -- - > tdb2.log2 2>&1
> >
> > When Jena decompresses a bz2 file, it uses a Apache Common Compress so
> > it is a java decompressor which will take time to get optimized by the
> > JIT and is likely slower than a specialized tool like bzip2.
> >
> > But with 4 core, it may have the opposite effect - using more processes
> > causes preemption timeslicing.
> >
> > It maybe one of the other loaders is faster because it is a better
match
> > to the hardware.
> >
> >> Is this expected behaviour? What factors influence this?
> >>
> >> SSD - local vs cloud.
> >>
> >> on my local machine, when running parallel loader, cores were working
> >> at over 70% capacity and there was little IO induced down time.
> >
> > How many core were active?
> > And when it says "nq" is really quads or all data for the default
graph?
> > (there is more indexing work for named graphs).
> >
> > Some of that will be the bz2 decompression but it looks to me "like
it's
> > "more threads than cores" causing timeslicing.
> >
> >>
> >> GCP instance specs:
> >>
> >> 20 CPU
> >> 32GB RAM
> >
> > And same heap size?
> >
> > While the parallel loader is using multiple threads it is a fixed
number
> > so more CPU will help only if
> >
> > More RAM is going to help because the OS will use it for file system
> > cache, delaying writes.
> >
> > But with more read threads, it could be there is less preemptive
> > scheduling and that could be a big gain.
> >
> >> 6TB "local SSD" storage
> >> the local SSD storage offers the best performance to reduce IO latency
> >> - it has physical proximity to instance - as per GCP.
> >>
> >> a few cores were working at near capacity, while the vast majority
> >> idle (near 0%) w occasional spikes. average load translates to 20%
> >> utilization. As I've seen others write here, this is a difference
> >> others have noted.
> >> How can this be addressed? buffer size? (I don't have a deep enough
> >> understanding).
> >
> > My guess is that on the GCP instance it is one thread-one core.
> >
> >>
> >>
> >> Another recurring pattern is the reduction in batch size.
> >> I've been running a load job on my gcp instance for almost a day
(23+h).
> >>
> >> file size: 93GB
> >> triples: 472m
> >>
> >> batch size decreased from 160k range to under 1k, while processing
> >> time per batch increased from a few seconds to over 10 min. All this
> >> time average CPU usage has remained steady, as has RAM usage.
> >
> > Not sure I quite understand - this is adding more data to an existing
> > database? And 10mins for 1k? While it will be slower, that does sound
> > rather extreme.
> >
> >>
> >> I don't understand how all of this works with indexing. Is this
> >> expected behaviour? besides a locally proximate SSD, I've thrown an
> >> overkill of hardware at it.
> >>
> >> thanks
> >>
> >
> > Andy
> >
>