thank you - your link indicates why the solution with calling s-put for each individual file is so slow.

practically - i will just wait the 10 hours and then extract the triples from the store.

can you understand, why somebody would select this format? what is the advantage?

andrew



On 10/07/2017 10:52 AM, zPlus wrote:
Hello Andrew,

if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.

I hope this helps.

----
[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54

----- Original Message -----
From: [email protected]
To:"[email protected]" <[email protected]>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files

  i have to load the Gutenberg projects catalog in rdf/xml format. this
is
  a collection of about 50,000 files, each containing a single record
as
  attached.

  if i try to concatenate these files into a single one the result is
not
  legal rdf/xml - there are xml doc headers:

  <rdf:RDF xml:base="http://www.gutenberg.org/";>

  and similar, which can only occur once per file.

  i found a way to load each file individually with s-put and a loop,
but
  this runs extremely slowly - it is alrady running for more than 10
  hours; each file takes half a second to load (fuseki running as
localhost).

  i am sure there is a better way?

  thank you for the help!

  andrew

  --
  em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
  +43 1 58801 12710 direct
  Geoinformation, TU Wien +43 1 58801 12700 office
  Gusshausstr. 27-29 +43 1 55801 12799 fax
  1040 Wien Austria +43 676 419 25 72 mobil




--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
                                 +43 1 58801 12710 direct
Geoinformation, TU Wien          +43 1 58801 12700 office
Gusshausstr. 27-29               +43 1 55801 12799 fax
1040 Wien Austria                +43 676 419 25 72 mobil

Reply via email to