xml files

John A. Fereira Wed, 11 Oct 2017 04:18:26 -0700

I am the maintainer of a project called VIVO-Harvester 
(https://github.com/vivo-project/VIVO-Harvester).   VIVO is a semantic web 
application that includes it's own ontology, primarily for modeling people, 
organizations, scholarly research and other attributes related to an academic 
environment.  The VIVO Harvester is a suite of tools for taking structured 
data, mapping it to the ontology and loading the RDF into a triple store.


The project includes several "fetcher" tools that can read data from several 
formats (JSON, a SQL database, a CSV file,  Excel spreadsheet, or simple XML 
either from a flat file or a web service).  The output of the fetcher tools is 
a "faux" RDF file that uses a temporary namespace and a flat structure.   XSLT 
transforms are used to map that RDF to the real ontology used by the VIVO 
application (there is no reason why it can't be used for other ontologies).  
The project might provide some ideas for non-VIVO projects as well.  I'd 
suggest looking at the develop branch.

For the mapping to an ontology piece something like D2RQ is an option if you're 
data is in a relational database.  I'd also suggest taking a look at a project 
called Karma (http://usc-isi-i2.github.io/karma/).  It also can read data in 
multiple formats and provides a GUI for mapping data into RDF.  


-----Original Message-----
From: Olivier Rossel [mailto:[email protected]] 
Sent: Wednesday, October 11, 2017 3:16 AM
To: [email protected]; [email protected]
Subject: Re: Why RDF/XML? Was: loading many small rdf/xml files

I don't see any practical usage of managing RDF data via its XML serialization 
through XML tools.
In my town, a huge project tried to store graph data in a XML database. And 
querying all that with XQuery.
It was probably the most expensive failure I have seen in my career.
(performances were awful).

I think it is and always has been a HUGE error to maintain this ambiguity that 
RDF/XML is XML. No no and no, it is RDF.
May be you can generate RDF/XML via XML tools. Sure.
But consuming RDF/XML with XML tools is a BAD idea.



On Sat, Oct 7, 2017 at 6:14 PM,  <[email protected]> wrote:
> Simply because it is both XML and RDF.
>
> There is an enormous installed base of expertise and tooling for XML. 
> It's often worth taking advantage of, even if it is technically 
> unperformant on a case-by-case basis. If you have to process RDF and 
> you already know a great deal about XML and use languages like XSLT or 
> XQuery, reusing them for RDF is very attractive.
>
> Historically, there was an idea of a unified layered architecture to 
> the semantic web activity. I think this Wikipedia page:
> https://en.wikipedia.org/wiki/Semantic_Web_Stack is old enough to 
> portray that idea. I'm not sure anyone now would be willing to argue 
> that XML sits under RDF as a syntax layer. (Think about the evolution 
> of JSON and JSON-LD, not shown at all on that picture.)
>
>
> ajs6f
>
> Andrew U. Frank wrote on 10/7/17 12:06 PM:
>>
>> thank you - your link indicates why the solution with calling s-put 
>> for each individual file is so slow.
>>
>> practically - i will just wait the 10 hours and then extract the 
>> triples from the store.
>>
>> can you understand, why somebody would select this format? what is 
>> the advantage?
>>
>> andrew
>>
>>
>>
>> On 10/07/2017 10:52 AM, zPlus wrote:
>>>
>>> Hello Andrew,
>>>
>>> if I understand this correctly, I think I stumbled on the same 
>>> problem before. Concatenating XML files will not work indeed. My 
>>> solution was to convert all XML files to N-Triples, then concatenate 
>>> all those triples into a single file, and finally load only this file.
>>> Ultimately, what I ended up with is this loop [1]. The idea is to 
>>> call RIOT with a list of files as input, instead of calling RIOT on 
>>> every file.
>>>
>>> I hope this helps.
>>>
>>> ----
>>> [1] https://notabug.org/metadb/pipeline/src/master/build.sh#L54
>>>
>>> ----- Original Message -----
>>> From: [email protected]
>>> To:"[email protected]" <[email protected]>
>>> Cc:
>>> Sent:Sat, 7 Oct 2017 10:17:18 -0400
>>> Subject:loading many small rdf/xml files
>>>
>>>   i have to load the Gutenberg projects catalog in rdf/xml format. 
>>> this is
>>>   a collection of about 50,000 files, each containing a single 
>>> record as
>>>   attached.
>>>
>>>   if i try to concatenate these files into a single one the result 
>>> is not
>>>   legal rdf/xml - there are xml doc headers:
>>>
>>>   <rdf:RDF xml:base="http://www.gutenberg.org/";>
>>>
>>>   and similar, which can only occur once per file.
>>>
>>>   i found a way to load each file individually with s-put and a 
>>> loop, but
>>>   this runs extremely slowly - it is alrady running for more than 10
>>>   hours; each file takes half a second to load (fuseki running as 
>>> localhost).
>>>
>>>   i am sure there is a better way?
>>>
>>>   thank you for the help!
>>>
>>>   andrew
>>>
>>>   --
>>>   em.o.Univ.Prof. Dr. sc.techn. Dr. h.c. Andrew U. Frank
>>>   +43 1 58801 12710 direct
>>>   Geoinformation, TU Wien +43 1 58801 12700 office
>>>   Gusshausstr. 27-29 +43 1 55801 12799 fax
>>>   1040 Wien Austria +43 676 419 25 72 mobil
>>>
>>>
>>>
>>
>

RE: Why RDF/XML? Was: loading many small rdf/xml files

Reply via email to