Hello Andrew,

if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.

I hope this helps.

----
[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54

----- Original Message -----
From: [email protected]
To:"[email protected]" <[email protected]>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files

 i have to load the Gutenberg projects catalog in rdf/xml format. this
is 
 a collection of about 50,000 files, each containing a single record
as 
 attached.

 if i try to concatenate these files into a single one the result is
not 
 legal rdf/xml - there are xml doc headers:

 <rdf:RDF xml:base="http://www.gutenberg.org/";>

 and similar, which can only occur once per file.

 i found a way to load each file individually with s-put and a loop,
but 
 this runs extremely slowly - it is alrady running for more than 10 
 hours; each file takes half a second to load (fuseki running as
localhost).

 i am sure there is a better way?

 thank you for the help!

 andrew

 -- 
 em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
 +43 1 58801 12710 direct
 Geoinformation, TU Wien +43 1 58801 12700 office
 Gusshausstr. 27-29 +43 1 55801 12799 fax
 1040 Wien Austria +43 676 419 25 72 mobil


Reply via email to