I'm loading a 111 million triples file (GND German Authority files). For the 
first roughly 70 million triples, it's really fast (more than 60,000 avg), but 
then throughput declines continuously to a thousand or just some hundred 
triples (which brings down the avg to less than 7000). During the last part of 
triples data phase, java is down to 1-2% CPU usage, while disk usage goes up to 
100%.

As TDB writes to disk, I'd expect rather linear loading times. The Centos 6 
64bit machine (11.5 GB memory) runs on a VMware vSphere cluster, with SAN 
hardware under-laying. As I observed the same behavior at different times a 
day, with for sure different load situations, there is no indication that it 
depended on parallel actions on the cluster.

Perhaps there is something wrong in my config, but I could not figure out what 
it may be. I add an extract of the log below - it would be great if somebody 
could help me with hints.

Cheers, Joachim

---------------

2013-10-25 13:33:33 start run

Configuration:
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
JAVA_OPTS: -d64 -Xms6g -Xmx10g
Jena:       VERSION: 2.11.0
Jena:       BUILD_DATE: 2013-09-12T10:49:49+0100
ARQ:        VERSION: 2.11.0
ARQ:        BUILD_DATE: 2013-09-12T10:49:49+0100
RIOT:       VERSION: 2.11.0
RIOT:       BUILD_DATE: 2013-09-12T10:49:49+0100
TDB:        VERSION: 1.0.0
TDB:        BUILD_DATE: 2013-09-12T10:49:49+0100

Use fuseki tdb.tdbloader on file /opt/thes/var/gnd/latest/src/GND.ttl.gz
INFO  -- Start triples data phase
INFO  ** Load empty triples table
INFO  Load: /opt/thes/var/gnd/latest/src/GND.ttl.gz -- 2013/10/25 13:33:35 MESZ
INFO  Add: 10.000.000 triples (Batch: 64.766 / Avg: 59.984)
INFO    Elapsed: 166,71 seconds [2013/10/25 13:36:21 MESZ]
INFO  Add: 20.000.000 triples (Batch: 71.839 / Avg: 58.653)
INFO    Elapsed: 340,99 seconds [2013/10/25 13:39:16 MESZ]
INFO  Add: 30.000.000 triples (Batch: 67.750 / Avg: 60.271)
INFO    Elapsed: 497,75 seconds [2013/10/25 13:41:52 MESZ]
INFO  Add: 40.000.000 triples (Batch: 68.212 / Avg: 60.422)
INFO    Elapsed: 662,01 seconds [2013/10/25 13:44:37 MESZ]
INFO  Add: 50.000.000 triples (Batch: 54.171 / Avg: 60.645)
INFO    Elapsed: 824,47 seconds [2013/10/25 13:47:19 MESZ]
INFO  Add: 60.000.000 triples (Batch: 58.823 / Avg: 60.569)
INFO    Elapsed: 990,60 seconds [2013/10/25 13:50:05 MESZ]
INFO  Add: 70.000.000 triples (Batch: 45.495 / Avg: 60.468)
INFO    Elapsed: 1.157,63 seconds [2013/10/25 13:52:52 MESZ]
INFO  Add: 80.000.000 triples (Batch: 50.050 / Avg: 57.998)
INFO    Elapsed: 1.379,36 seconds [2013/10/25 13:56:34 MESZ]
INFO  Add: 90.000.000 triples (Batch: 13.954 / Avg: 52.447)
INFO    Elapsed: 1.716,02 seconds [2013/10/25 14:02:11 MESZ]
INFO  Add: 100.000.000 triples (Batch: 1.134 / Avg: 19.024)
INFO    Elapsed: 5.256,29 seconds [2013/10/25 15:01:11 MESZ]
INFO  Add: 110.000.000 triples (Batch: 944 / Avg: 7.643)
INFO    Elapsed: 15.942,27 seconds [2013/10/25 17:59:17 MESZ]
INFO  -- Finish triples data phase
INFO  111.813.447 triples loaded in 16.288,16 seconds [Rate: 6.864,71 per 
second]

Indexing phase also takes its time, and has some decrease in performance too, 
but does not show a sharp drop.

INFO  -- Start triples index phase
INFO    Elapsed: 20.563,36 seconds [2013/10/25 19:16:18 MESZ]
INFO  ** Index SPO->POS: 111.786.233 slots indexed in 4.371,67 seconds [Rate: 
25.570,57 per second]
...
INFO  -- Finish triples index phase
INFO  ** 111.786.233 triples indexed in 19.973,81 seconds [Rate: 5.596,64 per 
second]
INFO  -- Finish triples load
INFO  ** Completed: 111.813.447 triples loaded in 36.261,98 seconds [Rate: 
3.083,49 per second]

Reply via email to