Re: TDB2 parallel load on cloud SSD and other observations/questions

Andy Seaborne Sun, 21 Jun 2020 14:11:48 -0700

Hi there,

Thanks for reporting the findings.


On 20/06/2020 16:10, Isroel Kogan wrote:

Hi,

I am also a newcomer to the RDF world - and particularly Jena, which I started 
using this week.

A couple of observations I have made over the last few days exploring different 
options.

Local Machine (specs):

Ubuntu 18.04
Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 CPU)

which is 4 core and hyper threading. for this workload that is more like4 threads. HT is not a completely x2 for this sort of continuousprocessing threading.


And pre-emtptive timeslicing is not nice!

16GB RAM
512 SSD (NVMe).


the following compares loading a file in compressed vs decompressed format 
-both w parallel loader.

file:
docstrings_triples.nq
size: 28GB

cmd:
time tdb2.tdbloader --loader=parallel --loc=test1graphdb docstrings_triples.nq > 
tdb2.log1 2>&1

:: Time = 1,364.310 seconds : Quads = 127,206,280 : Rate = 93,239 /s

real    22m46.346s
user    120m46.591s
sys    3m22.698s


file:
docstrings_triples.nq.bz2
size: 542M

cmd:

time tdb2.tdbloader --loader=parallel --loc=test2graphdb docstrings_triples.nq.bz2 > 
tdb2.log2 2>&1

:: Time = 2,225.871 seconds : Quads = 127,206,280 : Rate = 57,149 /s


real    37m8.182s
user    109m42.970s
sys    6m27.426s

resulting DB size
30GB

confirmed equal via diff.

pbzip2 ran in 84s

Less rigorously I noticed a similar gain in speed for other files.

For gz files, the speed of loading of compressed vs uncompressed isusually not very much. It does look like bz2


Using a separate process and faster decompressor may help:

bzip2 -d < docstrings_triples.nq.bz2 | \
time tdb2.tdbloader --loader=parallel --loc=test2graphdb \
    -- - > tdb2.log2 2>&1

When Jena decompresses a bz2 file, it uses a Apache Common Compress soit is a java decompressor which will take time to get optimized by theJIT and is likely slower than a specialized tool like bzip2.

But with 4 core, it may have the opposite effect - using more processescauses preemption timeslicing.

It maybe one of the other loaders is faster because it is a better matchto the hardware.

Is this expected behaviour? What factors influence this?

SSD - local vs cloud.

on my local machine, when running parallel loader, cores were working at over 
70% capacity and there was little IO induced down time.


How many core were active?

And when it says "nq" is really quads or all data for the default graph?(there is more indexing work for named graphs).

Some of that will be the bz2 decompression but it looks to me "like it's"more threads than cores" causing timeslicing.


GCP instance specs:

20 CPU
32GB RAM


And same heap size?

While the parallel loader is using multiple threads it is a fixed numberso more CPU will help only if

More RAM is going to help because the OS will use it for file systemcache, delaying writes.

But with more read threads, it could be there is less preemptivescheduling and that could be a big gain.

6TB "local SSD" storage
the local SSD storage offers the best performance to reduce IO latency - it has 
physical proximity to instance - as per GCP.

a few cores were working at near capacity, while the vast majority idle (near 
0%) w occasional spikes. average load translates to 20% utilization. As I've 
seen others write here, this is a difference others have noted.
How can this be addressed? buffer size? (I don't have a deep enough 
understanding).


My guess is that on the GCP instance it is one thread-one core.



Another recurring pattern is the reduction in batch size.
I've been running a load job on my gcp instance for almost a day (23+h).

file size: 93GB
triples: 472m

batch size decreased from 160k range to under 1k, while processing time per 
batch increased from a few seconds to over 10 min. All this time average CPU 
usage has remained steady, as has RAM usage.

Not sure I quite understand - this is adding more data to an existingdatabase? And 10mins for 1k? While it will be slower, that does soundrather extreme.


I don't understand how all of this works with indexing. Is this expected 
behaviour? besides a locally proximate SSD, I've thrown an overkill of hardware 
at it.

thanks


    Andy

Re: TDB2 parallel load on cloud SSD and other observations/questions

Reply via email to