The continual round trip times are more than the time it takes Fuseki to
perform an update.
On 07/10/17 15:42, [email protected] wrote:
Couple of possibilities:
1) Get something other than RDF/XML from Gutenberg. I don't mean that to
sound flippant. They may very well maintain some other representation
(NTriples, Turtle, etc) for their own use and they might be willing to
share it. It's worth an email. Then use SOH.
2A) Convert your stuff to a single NTriples (streamable) file and load
it into a TDB database locally, then put it on the server. You can use
riot to do this (it can accept more than one filename) but with that
many files, you may need to do it in several stages or groups, or use
xargs or the like. This may or may not work for you, depending on
whether you have access to the server to install a TDB database directly
into Fuseki, or only via HTTP.
2B) Convert your stuff to a single NTriples (streamable) file using riot
and load it via SOH.
(or load it via the UI).
+1 to Adam's and Martynas's suggestion of preparing a single N-triples
file. parer each file to N-triples with riot (slight bonus - all riot
with a number of files at the same time - for various OS reasons, you
can't give all 50,000 at one time from the command line).
The added benefit here is that the data is checked before loading - even
the best data does occasionally have errors in it and it is easier to
notice that before uploading.
You can separately add prefixes by sending a Turtle file of prefixes
with no triples.
Andy
ajs6f
Andrew U. Frank wrote on 10/7/17 10:17 AM:
i have to load the Gutenberg projects catalog in rdf/xml format. this
is a collection of about 50,000 files, each
containing a single record as attached.
if i try to concatenate these files into a single one the result is
not legal rdf/xml - there are xml doc headers:
<rdf:RDF xml:base="http://www.gutenberg.org/">
and similar, which can only occur once per file.
i found a way to load each file individually with s-put and a loop,
but this runs extremely slowly - it is alrady
running for more than 10 hours; each file takes half a second to load
(fuseki running as localhost).
i am sure there is a better way?
thank you for the help!
andrew