This mail of yesterday didn't get through - here again. The data of the broken load is temporarily linked from http://134.245.93.72/beta/tmp.
I've now invoked /opt/jena/bin/tdb2.tdbloader --loader=parallel --loc=/zbw/var/lib/fuseki/databases/temp ../var/gnd/2021-11/src/GND.utf8.ttl.gz ../var/gnd/2021-11/src/gnd-sc.ttl ../var/gnd/2021-11/src/gnd-sc_notation.ttl ../var/gnd/2021-11/src/gndo.ttl and got a steadily decreasing rate (see below). On the other hand, the total load time is nice. tdbstats ran correctly afterwards, and the query for gndo:DifferentiatedPerson works as expected. Cheers - Joachim Jena version: 4.5.0-SNAPSHOT JAVA_OPTS: -d64 -Xmx12G Loader: tdb2.tdbloader LOADER_OPTS: --loader=parallel 16:26:29 INFO loader :: Loader = LoaderParallel 16:26:29 INFO loader :: Start: 4 files 16:26:34 INFO loader :: Add: 1,000,000 GND.utf8.ttl.gz (Batch: 202,634 / Avg: 202,634) 16:26:40 INFO loader :: Add: 2,000,000 GND.utf8.ttl.gz (Batch: 143,636 / Avg: 168,109) 16:26:52 INFO loader :: Add: 3,000,000 GND.utf8.ttl.gz (Batch: 84,588 / Avg: 126,480) 16:27:05 INFO loader :: Add: 4,000,000 GND.utf8.ttl.gz (Batch: 78,672 / Avg: 109,799) 16:27:17 INFO loader :: Add: 5,000,000 GND.utf8.ttl.gz (Batch: 82,413 / Avg: 102,956) 16:27:29 INFO loader :: Add: 6,000,000 GND.utf8.ttl.gz (Batch: 85,375 / Avg: 99,540) 16:27:41 INFO loader :: Add: 7,000,000 GND.utf8.ttl.gz (Batch: 80,716 / Avg: 96,331) ... 17:00:54 INFO loader :: Add: 164,000,000 GND.utf8.ttl.gz (Batch: 81,672 / Avg: 79,401) 17:01:04 WARN riot :: [line: 205782946, col: 17] Not advised IRI: <https://http://www.gordonkampe.de> Code: 12/PORT_SHOULD_NOT_BE_EMPTY in PORT: The colon introducing an empty port component should be omitted entirely, or a port number should be specified. 17:01:05 INFO loader :: Add: 165,000,000 GND.utf8.ttl.gz (Batch: 90,106 / Avg: 79,458) 17:01:12 INFO loader :: End file: GND.utf8.ttl.gz (triples/quads = 165,639,860) 17:01:12 INFO loader :: End file: gnd-sc.ttl (triples/quads = 1,963) 17:01:12 INFO loader :: End file: gnd-sc_notation.ttl (triples/quads = 463) 17:01:12 INFO loader :: End file: gndo.ttl (triples/quads = 4,436) 17:01:12 INFO loader :: Finished: 4 files: 165,646,722 tuples in 2083.74s (Avg: 79,495) 17:01:27 INFO loader :: Finish - index SPO 17:01:28 INFO loader :: Finish - index POS 17:01:32 INFO loader :: Finish - index OSP 17:01:32 INFO loader :: Time = 2,103.562 seconds : Triples = 165,646,722 : Rate = 78,746 /s 2022-02-21 17:01:35 finished loading > -----Ursprüngliche Nachricht----- > Von: Andy Seaborne <[email protected]> > Gesendet: Montag, 21. Februar 2022 10:11 > An: [email protected] > Betreff: Re: Broken GND dataset after loading with > tdb2.xloader+tdb2.tdbloader > > On 21/02/2022 08:27, Neubert, Joachim wrote: > > I've reloaded the GND dataset at http://zbw.eu/beta/sparql/gnd/query > with 4.5.0-SNAPSHOT. The sources were a 133G .nt.gz file, plus > several small .ttl files with ontology etc. I loaded the large one > with tdb2.xloader, and immediately after that the smaller ones with > tdb2.tdbloader (see protocol at > https://zbw.eu/beta/tmp/fuseki/create_tdb_20220220.log). > > What's the URL for the data files? > > > Two things smelled fishy in this load: > > > > 1) The tdb2.tdbstats call after the loading looped at 100% CPU, and > > I had to kill it after an hour or so (this is reproducible) > > Unclear. If I can get the data, I can see it it happens here. > > > > > 2) some files remained in the fuseki/databases/temp directory (1.3G > > triples.tmp.gz, empty quads.tmp.gz, and a load.json with > > You can delete the files after the xloader has finished. > > > > > { > > "ingested" : "2022-02-20T13:15:45.528+00:00" , > > "data" : [ "../var/gnd/2021-11/src/GND.utf8.ttl.gz" ] , > > "triples" : 165639860 , > > "quads" : 0 > > } > > Just give all the files to single run of tdb2.tdbloader > --loader=parallel. At 165e6, it should be significant faster than > xloader - there isn't a benefit to xloader. > > > > > Text indexing however worked, and also a few example queries. > > However, a > basic query like "?x gndo:DifferentiatedPerson ." does not work any more. > > > > Any idea what could have gone wrong? > > > > Cheers, Joachim > > > > > > Andy
