I'm loading a 111 million triples file (GND German Authority files). For the first roughly 70 million triples, it's really fast (more than 60,000 avg), but then throughput declines continuously to a thousand or just some hundred triples (which brings down the avg to less than 7000). During the last part of triples data phase, java is down to 1-2% CPU usage, while disk usage goes up to 100%.
As TDB writes to disk, I'd expect rather linear loading times. The Centos 6 64bit machine (11.5 GB memory) runs on a VMware vSphere cluster, with SAN hardware under-laying. As I observed the same behavior at different times a day, with for sure different load situations, there is no indication that it depended on parallel actions on the cluster. Perhaps there is something wrong in my config, but I could not figure out what it may be. I add an extract of the log below - it would be great if somebody could help me with hints. Cheers, Joachim --------------- 2013-10-25 13:33:33 start run Configuration: java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) JAVA_OPTS: -d64 -Xms6g -Xmx10g Jena: VERSION: 2.11.0 Jena: BUILD_DATE: 2013-09-12T10:49:49+0100 ARQ: VERSION: 2.11.0 ARQ: BUILD_DATE: 2013-09-12T10:49:49+0100 RIOT: VERSION: 2.11.0 RIOT: BUILD_DATE: 2013-09-12T10:49:49+0100 TDB: VERSION: 1.0.0 TDB: BUILD_DATE: 2013-09-12T10:49:49+0100 Use fuseki tdb.tdbloader on file /opt/thes/var/gnd/latest/src/GND.ttl.gz INFO -- Start triples data phase INFO ** Load empty triples table INFO Load: /opt/thes/var/gnd/latest/src/GND.ttl.gz -- 2013/10/25 13:33:35 MESZ INFO Add: 10.000.000 triples (Batch: 64.766 / Avg: 59.984) INFO Elapsed: 166,71 seconds [2013/10/25 13:36:21 MESZ] INFO Add: 20.000.000 triples (Batch: 71.839 / Avg: 58.653) INFO Elapsed: 340,99 seconds [2013/10/25 13:39:16 MESZ] INFO Add: 30.000.000 triples (Batch: 67.750 / Avg: 60.271) INFO Elapsed: 497,75 seconds [2013/10/25 13:41:52 MESZ] INFO Add: 40.000.000 triples (Batch: 68.212 / Avg: 60.422) INFO Elapsed: 662,01 seconds [2013/10/25 13:44:37 MESZ] INFO Add: 50.000.000 triples (Batch: 54.171 / Avg: 60.645) INFO Elapsed: 824,47 seconds [2013/10/25 13:47:19 MESZ] INFO Add: 60.000.000 triples (Batch: 58.823 / Avg: 60.569) INFO Elapsed: 990,60 seconds [2013/10/25 13:50:05 MESZ] INFO Add: 70.000.000 triples (Batch: 45.495 / Avg: 60.468) INFO Elapsed: 1.157,63 seconds [2013/10/25 13:52:52 MESZ] INFO Add: 80.000.000 triples (Batch: 50.050 / Avg: 57.998) INFO Elapsed: 1.379,36 seconds [2013/10/25 13:56:34 MESZ] INFO Add: 90.000.000 triples (Batch: 13.954 / Avg: 52.447) INFO Elapsed: 1.716,02 seconds [2013/10/25 14:02:11 MESZ] INFO Add: 100.000.000 triples (Batch: 1.134 / Avg: 19.024) INFO Elapsed: 5.256,29 seconds [2013/10/25 15:01:11 MESZ] INFO Add: 110.000.000 triples (Batch: 944 / Avg: 7.643) INFO Elapsed: 15.942,27 seconds [2013/10/25 17:59:17 MESZ] INFO -- Finish triples data phase INFO 111.813.447 triples loaded in 16.288,16 seconds [Rate: 6.864,71 per second] Indexing phase also takes its time, and has some decrease in performance too, but does not show a sharp drop. INFO -- Start triples index phase INFO Elapsed: 20.563,36 seconds [2013/10/25 19:16:18 MESZ] INFO ** Index SPO->POS: 111.786.233 slots indexed in 4.371,67 seconds [Rate: 25.570,57 per second] ... INFO -- Finish triples index phase INFO ** 111.786.233 triples indexed in 19.973,81 seconds [Rate: 5.596,64 per second] INFO -- Finish triples load INFO ** Completed: 111.813.447 triples loaded in 36.261,98 seconds [Rate: 3.083,49 per second]