On 21/02/2022 08:27, Neubert, Joachim wrote:
I've reloaded the GND dataset at http://zbw.eu/beta/sparql/gnd/query with 
4.5.0-SNAPSHOT. The sources were a 133G .nt.gz file,  plus several small .ttl 
files with ontology etc. I loaded the large one with tdb2.xloader, and 
immediately after that the smaller ones with tdb2.tdbloader (see protocol at 
https://zbw.eu/beta/tmp/fuseki/create_tdb_20220220.log).

What's the URL for the data files?

Two things smelled fishy in this load:

1) The tdb2.tdbstats call after the loading looped at 100% CPU, and I had to 
kill it after an hour or so (this is reproducible)

Unclear. If I can get the data, I can see it it happens here.


2) some files remained in the fuseki/databases/temp directory (1.3G 
triples.tmp.gz, empty quads.tmp.gz, and a load.json with

You can delete the files after the xloader has finished.


{
   "ingested" : "2022-02-20T13:15:45.528+00:00" ,
   "data" : [ "../var/gnd/2021-11/src/GND.utf8.ttl.gz" ] ,
   "triples" : 165639860 ,
   "quads" : 0
}

Just give all the files to single run of tdb2.tdbloader --loader=parallel. At 165e6, it should be significant faster than xloader - there isn't a benefit to xloader.


Text indexing however worked, and also a few example queries. However, a basic query like 
"?x gndo:DifferentiatedPerson ." does not work any more.

Any idea what could have gone wrong?

Cheers, Joachim



    Andy

Reply via email to