The continual round trip times are more than the time it takes Fuseki to perform an update.

On 07/10/17 15:42, [email protected] wrote:
Couple of possibilities:

1) Get something other than RDF/XML from Gutenberg. I don't mean that to sound flippant. They may very well maintain some other representation (NTriples, Turtle, etc) for their own use and they might be willing to share it. It's worth an email. Then use SOH.

2A) Convert your stuff to a single NTriples (streamable) file and load it into a TDB database locally, then put it on the server. You can use riot to do this (it can accept more than one filename) but with that many files, you may need to do it in several stages or groups, or use xargs or the like. This may or may not work for you, depending on whether you have access to the server to install a TDB database directly into Fuseki, or only via HTTP.

2B) Convert your stuff to a single NTriples (streamable) file using riot and load it via SOH.

(or load it via the UI).

+1 to Adam's and Martynas's suggestion of preparing a single N-triples file. parer each file to N-triples with riot (slight bonus - all riot with a number of files at the same time - for various OS reasons, you can't give all 50,000 at one time from the command line).

The added benefit here is that the data is checked before loading - even the best data does occasionally have errors in it and it is easier to notice that before uploading.

You can separately add prefixes by sending a Turtle file of prefixes with no triples.

    Andy

ajs6f

Andrew U. Frank wrote on 10/7/17 10:17 AM:
i have to load the Gutenberg projects catalog in rdf/xml format. this is a collection of about 50,000 files, each
containing a single record as attached.

if i try to concatenate these files into a single one the result is not legal rdf/xml - there are xml doc headers:

<rdf:RDF xml:base="http://www.gutenberg.org/";>

and similar, which can only occur once per file.

i found a way to load each file individually with s-put and a loop, but this runs extremely slowly - it is alrady running for more than 10 hours; each file takes half a second to load (fuseki running as localhost).

i am sure there is a better way?

thank you for the help!

andrew



Reply via email to