xml files

ajs6f Sat, 07 Oct 2017 09:14:46 -0700

Simply because it is both XML and RDF.

There is an enormous installed base of expertise and tooling for XML. It's often worth taking advantage of, even if itis technically unperformant on a case-by-case basis. If you have to process RDF and you already know a great deal aboutXML and use languages like XSLT or XQuery, reusing them for RDF is very attractive.

Historically, there was an idea of a unified layered architecture to the semantic web activity. I think this Wikipediapage: https://en.wikipedia.org/wiki/Semantic_Web_Stack is old enough to portray that idea. I'm not sure anyone now wouldbe willing to argue that XML sits under RDF as a syntax layer. (Think about the evolution of JSON and JSON-LD, not shownat all on that picture.)



ajs6f

Andrew U. Frank wrote on 10/7/17 12:06 PM:

thank you - your link indicates why the solution with calling s-put for each 
individual file is so slow.

practically - i will just wait the 10 hours and then extract the triples from 
the store.

can you understand, why somebody would select this format? what is the 
advantage?

andrew



On 10/07/2017 10:52 AM, zPlus wrote:

Hello Andrew,

if I understand this correctly, I think I stumbled on the same problem
before. Concatenating XML files will not work indeed. My solution was
to convert all XML files to N-Triples, then concatenate all those
triples into a single file, and finally load only this file.
Ultimately, what I ended up with is this loop [1]. The idea is to call
RIOT with a list of files as input, instead of calling RIOT on every
file.

I hope this helps.

----
[1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54

----- Original Message -----
From: [email protected]
To:"[email protected]" <[email protected]>
Cc:
Sent:Sat, 7 Oct 2017 10:17:18 -0400
Subject:loading many small rdf/xml files

  i have to load the Gutenberg projects catalog in rdf/xml format. this
is
  a collection of about 50,000 files, each containing a single record
as
  attached.

  if i try to concatenate these files into a single one the result is
not
  legal rdf/xml - there are xml doc headers:

  <rdf:RDF xml:base="http://www.gutenberg.org/";>

  and similar, which can only occur once per file.

  i found a way to load each file individually with s-put and a loop,
but
  this runs extremely slowly - it is alrady running for more than 10
  hours; each file takes half a second to load (fuseki running as
localhost).

  i am sure there is a better way?

  thank you for the help!

  andrew

  --
  em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
  +43 1 58801 12710 direct
  Geoinformation, TU Wien +43 1 58801 12700 office
  Gusshausstr. 27-29 +43 1 55801 12799 fax
  1040 Wien Austria +43 676 419 25 72 mobil

Why RDF/XML? Was: loading many small rdf/xml files

Reply via email to