Simply because it is both XML and RDF.
There is an enormous installed base of expertise and tooling for XML. It's often worth taking advantage of, even if it
is technically unperformant on a case-by-case basis. If you have to process RDF and you already know a great deal about
XML and use languages like XSLT or XQuery, reusing them for RDF is very attractive.
Historically, there was an idea of a unified layered architecture to the semantic web activity. I think this Wikipedia
page: https://en.wikipedia.org/wiki/Semantic_Web_Stack is old enough to portray that idea. I'm not sure anyone now would
be willing to argue that XML sits under RDF as a syntax layer. (Think about the evolution of JSON and JSON-LD, not shown
at all on that picture.)
ajs6f
Andrew U. Frank wrote on 10/7/17 12:06 PM:
thank you - your link indicates why the solution with calling s-put for each
individual file is so slow.
practically - i will just wait the 10 hours and then extract the triples from
the store.
can you understand, why somebody would select this format? what is the
advantage?
andrew
On 10/07/2017 10:52 AM, zPlus wrote:
Hello Andrew,
if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.
I hope this helps.
----
[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54
----- Original Message -----
From: [email protected]
To:"[email protected]" <[email protected]>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files
i have to load the Gutenberg projects catalog in rdf/xml format. this
is
a collection of about 50,000 files, each containing a single record
as
attached.
if i try to concatenate these files into a single one the result is
not
legal rdf/xml - there are xml doc headers:
<rdf:RDF xml:base="http://www.gutenberg.org/">
and similar, which can only occur once per file.
i found a way to load each file individually with s-put and a loop,
but
this runs extremely slowly - it is alrady running for more than 10
hours; each file takes half a second to load (fuseki running as
localhost).
i am sure there is a better way?
thank you for the help!
andrew
--
em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
+43 1 58801 12710 direct
Geoinformation, TU Wien +43 1 58801 12700 office
Gusshausstr. 27-29 +43 1 55801 12799 fax
1040 Wien Austria +43 676 419 25 72 mobil