Hello Andrew, if I understand this correctly, I think I stumbled on the same problem before. Concatenating XML files will not work indeed. My solution was to convert all XML files to N-Triples, then concatenate all those triples into a single file, and finally load only this file. Ultimately, what I ended up with is this loop [1]. The idea is to call RIOT with a list of files as input, instead of calling RIOT on every file.
I hope this helps. ---- [1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54 ----- Original Message ----- From: [email protected] To:"[email protected]" <[email protected]> Cc: Sent:Sat, 7 Oct 2017 10:17:18 -0400 Subject:loading many small rdf/xml files i have to load the Gutenberg projects catalog in rdf/xml format. this is a collection of about 50,000 files, each containing a single record as attached. if i try to concatenate these files into a single one the result is not legal rdf/xml - there are xml doc headers: <rdf:RDF xml:base="http://www.gutenberg.org/"> and similar, which can only occur once per file. i found a way to load each file individually with s-put and a loop, but this runs extremely slowly - it is alrady running for more than 10 hours; each file takes half a second to load (fuseki running as localhost). i am sure there is a better way? thank you for the help! andrew -- em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank +43 1 58801 12710 direct Geoinformation, TU Wien +43 1 58801 12700 office Gusshausstr. 27-29 +43 1 55801 12799 fax 1040 Wien Austria +43 676 419 25 72 mobil
