Re: tdb2.tdbloader performance

2017-12-02 Thread Laura Morales
> Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS.
> Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest
> concurrent source the server has handled. Fans work!

cool, this is another interesting statistics. Looks like there is quite some 
room for speeds up on a single machine (much simpler to deal with than 
distributing the work on several nodes), if TDB2 can be parallelized more...


Re: tdb2.tdbloader performance

2017-12-02 Thread Laura Morales
@Andy

> Which tdb loader?

I'd guess tdb2.tdbloader since he was replying to my previous email

> I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.
> 
> I'll write it up soon.

Could you please also share the model names of the hardware components? So we 
can check the various frequencies, bandwidths, latencies?

> The limit at scale is the I/O handling and disk cache. 128G RAM gives a
> better disk cache and that server machine probably has better I/O. It's
> big enough to fit one whole index (if all RAM is available - and that
> depends on the swappiness setting which should be set to zero ideally).

Do you have any idea then why executing everything from ramdisk doesn't seem to 
bring any significant improvements over reading/writing from a SATA3 disk (at 
least in my tests)?


Re: tdb2.tdbloader performance

2017-12-02 Thread Andy Seaborne



On 02/12/17 21:34, Dick Murray wrote:

Which tdb loader?


TDB2


tdb2.tdbloader?

It does fine, until RAM (file system cache) gets stressed ... and for 
2.2B triples, it gets stressed.


(TDB2 has a fast node table).

Andy


Re: tdb2.tdbloader performance

2017-12-02 Thread Andy Seaborne



On 02/12/17 21:59, ajs6f wrote:

Threads will not help a single load except for tdbloader2 (which is for TDB1) 
if tuned - see the command help and notes.  It uses sort(1) which can utilize 
multiple threads.


This was worth tuning for me. sort generally picks good parameters for a 
system, but I was able to get noticeably better performance by adjusting (up) 
the parallelism manually. But of course, that's a limited amount of 
improvement. (It's also worth making sure your locale is set appropriately. 
Avoid using Unicode collation and it will speed things up impressively.)



Shouldn't be necessary - tdbloader2index sets

   export LC_ALL="C"

(see sort(1))
[[
  *** WARNING *** The locale specified by the environment affects 
sort order.  Set LC_ALL=C to get the traditional sort  order

   that uses native byte values.
]]

If that didn't work, it needs a fix.

Andy


ajs6f


On Dec 2, 2017, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote:



On 01/12/17 22:28, Laura Morales wrote:

Thank you very much, this is great feedback!
Your setup was very similar to mine, except:
- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads
  despite this, the results are very similar; maybe yours are slightly better. I don't 
understand why this "60K" seems so hard to beat. What's so special about it?? 
It's so difficult to understand what to do to improve the conversion speed... do I buy 
more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory 
channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader


As already said - tdb2.tdbloader in its current form is not suitable for 
loading billion triple datasets (unless there is a lot of RAM ... I'd guess 
upward of 256G for truthy and a tuned server (swappy=0 for example), not that 
I've tried).


runs in a single thread? Maybe the reason is that with more cores, your xeon 
can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right?


"concurrent 120K"

I understood that to mean more than one load running at once.  Dick's system 
has multiple TDB databases and a large disk cache.

(I got 76K, single load, on somewhat less hardware so that suggests 120K may be 
affected by I/O contention.)


What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more 
feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to 
read your feedback with the Threadripper :)


Threads will not help a single load except for tdbloader2 (which is for TDB1) 
if tuned - see the command help and notes.  It uses sort(1) which can utilize 
multiple threads.

Andy


keep us posted!
Sent: Friday, December 01, 2017 at 9:11 PM
From: "Dick Murray" <dandh...@gmail.com>
To: users@jena.apache.org
Subject: Re: tdb2.tdbloader performance
Hi.
Sorry for the delay :-)
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.
I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
Long story follows...
decompress the file;
pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov 
[http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward
# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
---
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt
BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
---
Wall Clock: 5871.550948 seconds
count the lines;
wc -l latest-truthy.nt
2199382887 latest-truthy.nt
Just short of 2200M...
split the file into 10M chunks;
split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file

Re: tdb2.tdbloader performance

2017-12-02 Thread ajs6f
> Threads will not help a single load except for tdbloader2 (which is for TDB1) 
> if tuned - see the command help and notes.  It uses sort(1) which can utilize 
> multiple threads.

This was worth tuning for me. sort generally picks good parameters for a 
system, but I was able to get noticeably better performance by adjusting (up) 
the parallelism manually. But of course, that's a limited amount of 
improvement. (It's also worth making sure your locale is set appropriately. 
Avoid using Unicode collation and it will speed things up impressively.)

ajs6f

> On Dec 2, 2017, at 4:16 PM, Andy Seaborne <a...@apache.org> wrote:
> 
> 
> 
> On 01/12/17 22:28, Laura Morales wrote:
>> Thank you very much, this is great feedback!
>> Your setup was very similar to mine, except:
>> - I have 8GB RAM single bank, you have 16GB probably on two banks
>> - my CPU is "half" of yours, 2 cores 4 threads
>>  despite this, the results are very similar; maybe yours are slightly 
>> better. I don't understand why this "60K" seems so hard to beat. What's so 
>> special about it?? It's so difficult to understand what to do to improve the 
>> conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? 
>> Or a CPU with more cache? Or more memory channels? I still can't find an 
>> answer... Why would more cores help if tdb2.tdbloader 
> 
> As already said - tdb2.tdbloader in its current form is not suitable for 
> loading billion triple datasets (unless there is a lot of RAM ... I'd guess 
> upward of 256G for truthy and a tuned server (swappy=0 for example), not that 
> I've tried).
> 
>> runs in a single thread? Maybe the reason is that with more cores, your xeon 
>> can handle more RAM concurrently? I don't understand...
>> With your xeon, you said you were able to get to 120K? Right?
> 
> "concurrent 120K"
> 
> I understood that to mean more than one load running at once.  Dick's system 
> has multiple TDB databases and a large disk cache.
> 
> (I got 76K, single load, on somewhat less hardware so that suggests 120K may 
> be affected by I/O contention.)
> 
>> What xeon, mobo, and RAM did you use?
>> If anybody has any xeon or opteron, it would be nice if they could offer 
>> more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't 
>> wait to read your feedback with the Threadripper :)
> 
> Threads will not help a single load except for tdbloader2 (which is for TDB1) 
> if tuned - see the command help and notes.  It uses sort(1) which can utilize 
> multiple threads.
> 
>Andy
> 
>> keep us posted!
>> Sent: Friday, December 01, 2017 at 9:11 PM
>> From: "Dick Murray" <dandh...@gmail.com>
>> To: users@jena.apache.org
>> Subject: Re: tdb2.tdbloader performance
>> Hi.
>> Sorry for the delay :-)
>> Short story I used the following "reasonable" device
>> Dell M3800
>> Fedora 27
>> 16GB SODIMM DDR3 Synchronous 1600 MHz
>> CPU cache L1/256KB,L2/1MB,L3/6MB
>> Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads
>> to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
>> disk and;
>> @800% 60K/Sec
>> @100% 40K/Sec
>> @50% 20K/Sec
>> The full source file contains 2.2G of triples in 10GB bz2 which
>> decompresses to 250GB nt, which I split into 10M triple chunks and used the
>> first one to test.
>> Check with Andy but I think it's limited by CPU, which is why my 24 core (4
>> x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
>> performance hit.
>> I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
>> next few days and I will try and test against it.
>> I haven't run the full import because a: i'm guessing the resulting TDB2
>> will be "large" b: my servers are currently importing other "large"
>> TDB2's!!!
>> Long story follows...
>> decompress the file;
>> pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
>> Parallel BZIP2 v1.1.12 [Dec 21, 2014]
>> By: Jeff Gilchrist [http://compression.ca]
>> Major contributions: Yavor Nikolov 
>> [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
>> Uses libbzip2 by Julian Seward
>> # CPUs: 4
>> Maximum Memory: 1024 MB
>> Ignore Trailing Garbage: off
>> ---
>> File #: 1 of 1
>> Input Name: latest-truthy.nt.bz2
>> Output Name: latest-truthy.nt
>> BWT Block Size: 900k
>> Input Size: 9965955258 bytes
>> Decompressing data...
>> Output Size: 277563574685 bytes
>> --

Re: tdb2.tdbloader performance

2017-12-02 Thread Dick Murray
Hello.

On 2 Dec 2017 8:55 pm, "Andy Seaborne"  wrote:


Short story I used the following "reasonable" device
>
>  Dell M3800
>  Fedora 27
>  16GB SODIMM DDR3 Synchronous 1600 MHz
>  CPU cache L1/256KB,L2/1MB,L3/6MB
>  Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads
>
> to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
> disk and;
>
> @800%60K/Sec
> @100%40K/Sec
> @50%20K/Sec
>
> The full source file contains 2.2G of triples in 10GB bz2 which
> decompresses to 250GB nt, which I split into 10M triple chunks and used the
> first one to test.
>

Which tdb loader?


TDB2


For TDB1, the two loader behave very differently.

I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.

I'll write it up soon.


Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS.
Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest
concurrent source the server has handled. Fans work!



Check with Andy but I think it's limited by CPU, which is why my 24 core (4
> x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
> performance hit.
>

The limit at scale is the I/O handling and disk cache. 128G RAM gives a
better disk cache and that server machine probably has better I/O.  It's
big enough to fit one whole index (if all RAM is available - and that
depends on the swappiness setting which should be set to zero ideally).

CPU is a limit for a while but you'll see the load speed slows down so it
is not purely CPU as the limit. (As the indexes are 200-way trees, they
don't get very deep.)

tdbloader (loader1) does one index at a time so that the I/O is
constrained, unlike simply adding triples to all 3 indexes together (which
is what TDB2 loader does currently).

loader1 degrades at large scale due to random I/O write patterns on
secondary indexes.  Hence an SSD makes a big difference.

loader2 (which has high overhead) avoids the problems and only write
indexes from sorted input so no random access to the indexes.  An SSD makes
less difference.


I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
> next few days and I will try and test against it.
>
> I haven't run the full import because a: i'm guessing the resulting TDB2
> will be "large" b: my servers are currently importing other "large"
> TDB2's!!!
>

The TDB2 database for a single graph will be same size as TDB1 using
tdbloader (not tdbloader2).


> Long story follows...
>




Re: tdb2.tdbloader performance

2017-12-02 Thread Andy Seaborne



On 01/12/17 22:28, Laura Morales wrote:

Thank you very much, this is great feedback!
Your setup was very similar to mine, except:

- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads
  
despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader 


As already said - tdb2.tdbloader in its current form is not suitable for 
loading billion triple datasets (unless there is a lot of RAM ... I'd 
guess upward of 256G for truthy and a tuned server (swappy=0 for 
example), not that I've tried).



runs in a single thread? Maybe the reason is that with more cores, your xeon 
can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right?


"concurrent 120K"

I understood that to mean more than one load running at once.  Dick's 
system has multiple TDB databases and a large disk cache.


(I got 76K, single load, on somewhat less hardware so that suggests 120K 
may be affected by I/O contention.)



What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more 
feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to 
read your feedback with the Threadripper :)


Threads will not help a single load except for tdbloader2 (which is for 
TDB1) if tuned - see the command help and notes.  It uses sort(1) which 
can utilize multiple threads.


Andy



keep us posted!





Sent: Friday, December 01, 2017 at 9:11 PM
From: "Dick Murray" <dandh...@gmail.com>
To: users@jena.apache.org
Subject: Re: tdb2.tdbloader performance
Hi.

Sorry for the delay :-)

Short story I used the following "reasonable" device

Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.

Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.

I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!

Long story follows...

decompress the file;

pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov 
[http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward

# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
---
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt

BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
---

Wall Clock: 5871.550948 seconds

count the lines;

wc -l latest-truthy.nt
2199382887 latest-truthy.nt

Just short of 2200M...

split the file into 10M chunks;

split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...

Restart!

sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/

Re: tdb2.tdbloader performance

2017-12-02 Thread Andy Seaborne



Short story I used the following "reasonable" device

 Dell M3800
 Fedora 27
 16GB SODIMM DDR3 Synchronous 1600 MHz
 CPU cache L1/256KB,L2/1MB,L3/6MB
 Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800%60K/Sec
@100%40K/Sec
@50%20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.


Which tdb loader?

For TDB1, the two loader behave very differently.

I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8 
hours (76K triples/s) using TDB1 tdbloader2.


I'll write it up soon.


Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.


The limit at scale is the I/O handling and disk cache. 128G RAM gives a 
better disk cache and that server machine probably has better I/O.  It's 
big enough to fit one whole index (if all RAM is available - and that 
depends on the swappiness setting which should be set to zero ideally).


CPU is a limit for a while but you'll see the load speed slows down so 
it is not purely CPU as the limit. (As the indexes are 200-way trees, 
they don't get very deep.)


tdbloader (loader1) does one index at a time so that the I/O is 
constrained, unlike simply adding triples to all 3 indexes together 
(which is what TDB2 loader does currently).


loader1 degrades at large scale due to random I/O write patterns on 
secondary indexes.  Hence an SSD makes a big difference.


loader2 (which has high overhead) avoids the problems and only write 
indexes from sorted input so no random access to the indexes.  An SSD 
makes less difference.



I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!


The TDB2 database for a single graph will be same size as TDB1 using 
tdbloader (not tdbloader2).




Long story follows...





Re: tdb2.tdbloader performance

2017-12-01 Thread Dick Murray
Hi.

Sorry for the delay :-)

Short story I used the following "reasonable" device

Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800%60K/Sec
@100%40K/Sec
@50%20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.

Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.

I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!

Long story follows...

decompress the file;

pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

 # CPUs: 4
 Maximum Memory: 1024 MB
 Ignore Trailing Garbage: off
---
 File #: 1 of 1
 Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt

 BWT Block Size: 900k
 Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
---

 Wall Clock: 5871.550948 seconds

count the lines;

wc -l latest-truthy.nt
2199382887 latest-truthy.nt

Just short of 2200M...

split the file into 10M chunks;

split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...

Restart!

sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

ps aux | grep tdb2
root  3358  0.0  0.0 222844  5756 pts/0S+   19:22   0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3359  0.0  0.0   4500   776 pts/0S+   19:22   0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3360  0.0  0.0 120304  3288 pts/0S+   19:22   0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root  3361  4.9  0.0   450092 pts/0S<+  19:22   0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3366 95.7 14.8 7866116 2418768 pts/0 Sl+  19:22   1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick  3477  0.0  0.0 119728   972 pts/1S+   19:24   0:00 grep
--color=auto tdb2

Notice PID 3366 is -Xmx2G default.

19:26:49 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)

After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.

19:33:50 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)

export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC

sudo ps aux | grep tdb2
root  4317  0.0  0.0 222848  6236 pts/0S+   19:35   0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4321  0.0  0.0   4500   924 pts/0S+   19:35   0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4322  0.0  0.0 120304  3356 pts/0S+   19:35   0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root  4323  4.8  0.0   450088 pts/0S<+  19:35   0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4328 94.8 18.5 8406788 3036188 pts/0 Sl+  19:35   3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick  4594  0.0  0.0 119728  1024 pts/1S+   19:38   0:00 grep
--color=auto tdb2

At 800K PID was 3GB and peaked at 3.4GB just prior to completion.

19:39:23 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)

Throw all CPU resources at it i.e. 800

sudo 

Re: tdb2.tdbloader performance

2017-11-28 Thread Laura Morales
> I've had loads take over 24 hours and produce 350GB TDB1 instances...

Yeah 24H is still acceptable, but it's very borderline. Running a conversion 
that takes days becomes frustrating very soon. Of course I'm not trying to be 
mean here, but I think it's good to push the limits because we are already at a 
point where graphs have several billions triples. If my computer, which is an 
average consumer PC at best, can do 60-70K, two "average grade" nodes could 
already outperform your beefy server if only I could share the load on multiple 
PCs.

> Ok with the data, I have that somewhere and will run it through, hopefully 
> tonight if paid work doesn't get in the way ;-)

Thank you very much for trying this and for offering feedback. I'd be 
interested to know

- what components do you have (cpu/ram/disks/...)
- the AVG number of triples/second
- the final size of the TDB2 store

Also since you're already running this test, would you mind sharing the final 
TDB2 store instead of deleting it? :) If the output is not too large...


Re: tdb2.tdbloader performance

2017-11-28 Thread dandh988
I've had loads take over 24 hours and produce 350GB TDB1 instances...
You can run multiple loaders into separate instances and on sufficient kit they 
don't slow down. As a back ground I convert CAD files to triples or quads, 
typically 100M but some can be 500M. That's triples output not file input size.
Ok with the data, I have that somewhere and will run it through, hopefully 
tonight if paid work doesn't get in the way ;-)

Dick
 Original message From: Laura Morales <laure...@mail.com> Date: 
28/11/2017  18:34  (GMT+00:00) To: users@jena.apache.org Cc: 
users@jena.apache.org Subject: Re: tdb2.tdbloader performance 
> I've achieved concurrent 120K on the server hardware but it depends on the
input.

Good to see that it can go faster. I do understand that this metric is 
dependent on input, but it still looks rather slow considering that datasets 
keep growing. At this (constant) rate, Wikidata would still take at least 12-13 
hours.

> What the server hardware does do is allow me to run multiple processes and 
> average 60K.

tdb2.tdbloader is single threaded though, I don't know how multiple cores are 
going to help.

> We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1.

One graph per TDB store?

> On the minefield subject of hardware, do you have DDR3 or DDR4?

DDR3 1600MHz

> What
> chipset is driving it because Haswell’s dual-channel memory controller is
> going to have a hard time keeping up with the quad-channel memory
> controllers on Ivy Bridge-E and Haswell-E

Haswell, dual-channel I think.

> What files are you trying to import and i'll run them through?

The 1.1GB that I mentioned contains data that I can't make public on the 
Internet, but you can try with the Wikidata dump 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
You probably don't have to convert all of it. Just starting the conversion you 
should already see how many triples it's handling. I ran this comman 
`./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`.
If it goes any faster than 70K AVG triples/second, I'd be interested to know 
what hardware components you've got.


Re: tdb2.tdbloader performance

2017-11-28 Thread Laura Morales
> I've achieved concurrent 120K on the server hardware but it depends on the
input.

Good to see that it can go faster. I do understand that this metric is 
dependent on input, but it still looks rather slow considering that datasets 
keep growing. At this (constant) rate, Wikidata would still take at least 12-13 
hours.

> What the server hardware does do is allow me to run multiple processes and 
> average 60K.

tdb2.tdbloader is single threaded though, I don't know how multiple cores are 
going to help.

> We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1.

One graph per TDB store?

> On the minefield subject of hardware, do you have DDR3 or DDR4?

DDR3 1600MHz

> What
> chipset is driving it because Haswell’s dual-channel memory controller is
> going to have a hard time keeping up with the quad-channel memory
> controllers on Ivy Bridge-E and Haswell-E

Haswell, dual-channel I think.

> What files are you trying to import and i'll run them through?

The 1.1GB that I mentioned contains data that I can't make public on the 
Internet, but you can try with the Wikidata dump 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
You probably don't have to convert all of it. Just starting the conversion you 
should already see how many triples it's handling. I ran this comman 
`./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`.
If it goes any faster than 70K AVG triples/second, I'd be interested to know 
what hardware components you've got.


Re: tdb2.tdbloader performance

2017-11-28 Thread Dick Murray
LOL, there's lots of things where I'd like to "move the problem elsewhere".

I've achieved concurrent 120K on the server hardware but it depends on the
input. There's another recent Jena thread regarding sizing and that's tied
up with what's in the input. I see the same thing with loading data, some
files fly others seem to drag and it's not just the size. What the server
hardware does do is allow me to run multiple processes and average 60K.
Also up to a certain size I have an overclocked AMD (4.5Ghz) and it will
outperform everything until it hits its cache limit.

We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1. This brings it's own issues such as
distinct being high cost which we mitigate with a few tricks.

On the minefield subject of hardware, do you have DDR3 or DDR4? What
chipset is driving it because Haswell’s dual-channel memory controller is
going to have a hard time keeping up with the quad-channel memory
controllers on Ivy Bridge-E and Haswell-E. And yes Corsair quote 47GB/s for
DDR4, but you still need to write that somewhere and a M.2 a PCI-E 2.0 x4
at 1.6GB/s is almost 3x the througput of SATAIII at 600MB/s, PCI-E 3.0 x4
is 3.9GB/s, plus you now have Optane or 3D XPoint depending on what sounds
better

What files are you trying to import and i'll run them through?

Regards Dick

On 28 November 2017 at 15:30, Laura Morales  wrote:

> > Eventually something will give and you'll get a wait as something is
> spilled to something, ie cache to physical drive.
> > Also different settings suit different work loads. I have a number of
> +128GB units configured differently depending on what they need to do. The
> ETL setting only gives Java 8GB but the OS will consume close to 90GB
> virtual for the process as it basically dumps into file cache. At some
> point though that cache is written out to noon volatile storage. As the
> units have 24 cores I can actually run close to 12 processes before things
> start to effect each other. If you consider server class hardware there's a
> lot of thought to cache levels and how they cascade.
> > Switch the SATA for M.2 and you'll move the issue somewhere else...
>
> Well yeah, but having a problem at 10K triples/seconds is not the same
> problem as 1M triples/seconds. I'll gladly "move the problem elsewhere" if
> I knew how to get to 1M triples/seconds.
> Moving from SATA to M.2 I don't know if it's worth the trouble (and money)
> given that on my computer running from SATA3 disks or RAMdisk doesn't seem
> like it's making any difference. And RAM is much faster than M.2 too.
> Just out of curiosity, how many "AVG triples/seconds" can you get with
> your server-class hardware when converting a .nt to TDB2 using
> tdb2.tdbloader?
>


Re: tdb2.tdbloader performance

2017-11-28 Thread Laura Morales
> Eventually something will give and you'll get a wait as something is spilled 
> to something, ie cache to physical drive.
> Also different settings suit different work loads. I have a number of +128GB 
> units configured differently depending on what they need to do. The ETL 
> setting only gives Java 8GB but the OS will consume close to 90GB virtual for 
> the process as it basically dumps into file cache. At some point though that 
> cache is written out to noon volatile storage. As the units have 24 cores I 
> can actually run close to 12 processes before things start to effect each 
> other. If you consider server class hardware there's a lot of thought to 
> cache levels and how they cascade.
> Switch the SATA for M.2 and you'll move the issue somewhere else...

Well yeah, but having a problem at 10K triples/seconds is not the same problem 
as 1M triples/seconds. I'll gladly "move the problem elsewhere" if I knew how 
to get to 1M triples/seconds.
Moving from SATA to M.2 I don't know if it's worth the trouble (and money) 
given that on my computer running from SATA3 disks or RAMdisk doesn't seem like 
it's making any difference. And RAM is much faster than M.2 too.
Just out of curiosity, how many "AVG triples/seconds" can you get with your 
server-class hardware when converting a .nt to TDB2 using tdb2.tdbloader?


Re: tdb2.tdbloader performance

2017-11-28 Thread dandh988
Eventually something will give and you'll get a wait as something is spilled to 
something, ie cache to physical drive.
Also different settings suit different work loads. I have a number of +128GB 
units configured differently depending on what they need to do. The ETL setting 
only gives Java 8GB but the OS will consume close to 90GB virtual for the 
process as it basically dumps into file cache. At some point though that cache 
is written out to noon volatile storage. As the units have 24 cores I can 
actually run close to 12 processes before things start to effect each other. If 
you consider server class hardware there's a lot of thought to cache levels and 
how they cascade.
Switch the SATA for M.2 and you'll move the issue somewhere else...


Dick
 Original message From: Laura Morales <laure...@mail.com> Date: 
28/11/2017  14:06  (GMT+00:00) To: jena-users-ml <users@jena.apache.org> 
Subject: tdb2.tdbloader performance 
So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 
HDD available. I decided to try the conversion again on a 1.1GB .nt file.
I used `./tbd2.tdbloader --loc xxx --verbose file.nt`.
Reading the .nt file from the HDD, and writing to the HDD gave me about 60K 
triples/second on AVG. I don't have an SSD, but this PC seems to have enough 
RAM. So I started a livecd to be sure that I was running everything from RAM; 
all disks unmounted. I ran the same command, and the AVG number of 
triples/seconds is pretty much the same, perhaps only slightly better with 2K 
or 3K more per seconds. Conversion from the livecd seemed to use a full thread 
at 100%, 25% RAM, 0 SWAP.

This is... very surprising, I wasn't expecting this. I was expecting a 
significant improvement since I was running everything from RAM. What I get 
from this is that SATA3 disks are OK? That SSD won't really make any 
difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only 
ways to get more performance out of tdb2.tdbloader (since more RAM capacity 
doesn't seem to make any difference)?

Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is 
slowing down conversion? Like for example using less RAM than it's available or 
whatever?


tdb2.tdbloader performance

2017-11-28 Thread Laura Morales
So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 
HDD available. I decided to try the conversion again on a 1.1GB .nt file.
I used `./tbd2.tdbloader --loc xxx --verbose file.nt`.
Reading the .nt file from the HDD, and writing to the HDD gave me about 60K 
triples/second on AVG. I don't have an SSD, but this PC seems to have enough 
RAM. So I started a livecd to be sure that I was running everything from RAM; 
all disks unmounted. I ran the same command, and the AVG number of 
triples/seconds is pretty much the same, perhaps only slightly better with 2K 
or 3K more per seconds. Conversion from the livecd seemed to use a full thread 
at 100%, 25% RAM, 0 SWAP.

This is... very surprising, I wasn't expecting this. I was expecting a 
significant improvement since I was running everything from RAM. What I get 
from this is that SATA3 disks are OK? That SSD won't really make any 
difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only 
ways to get more performance out of tdb2.tdbloader (since more RAM capacity 
doesn't seem to make any difference)?

Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is 
slowing down conversion? Like for example using less RAM than it's available or 
whatever?