Hi,

Thank you Paul and everyone else who has shown interest in my XmlToRdf library. 

I will definitely contact that w3c group you mention.

I realize my email got a bit long, so tldr; XmlToRdf streams XML and needs very 
little ram. It handles common transforms with configuration and also supports 
SPARQL for transforming. It also handles mixed content and can build an IRI for 
an element as a composite ID based off its children and attributes. 

I have seen various ways of converting XML to RDF, from XSLT to Topbraid 
Composer. When building XmlToRdf I sought to make something as fast and memory 
efficient as possible. This is why XmlToRdf can convert 100 MB file in just a 
few seconds using 50 MB RAM (mathematically it requires about O(log n) memory 
and takes O(n) time). 

Initially XmlToRdf was a generic convertor that used SPARQL update and 
construct to transform the data. It still supports this approach, and even 
supplies a helper class to chain the transforms (PostProcessing). 

However, after using the library internally at Acando for a while it became 
obvious that a number of patterns kept repeating. One was about handling 
SimpleType elements as predicates with literals. Or being able to rename an 
element, and also to insert a predicate between two elements. 

Transforming with SPARQL can be a bit time consuming, not just having to write 
the query, but also that if you need multiple queries to rename elements, then 
it's going to take longer than renaming them during the initial XML to RDF 
conversion using a HashMap. 

We also ran into a number of complex cases. One was handling mixed content. 
Mixed content is where an element can contain both text and other elements. Our 
case was with handling markup of text inside an element. XmlToRdf handles mixed 
content by giving you the raw text, all the elements, and also an RDF list of 
the text and elements together. 

Another complex case was handing composite identifiers. Where you want to build 
the IRI for an element based on attributes or child elements or both. Since 
XmlToRdf uses a SAX streamer to do it's conversion (for speed and being able to 
handle huge XML files) it will buffer the elements until it can resolve (build) 
the IRI for the parent element. 

I would gladly help anyone to get started with the XmlToRdf library, just send 
me an email with your use case and I can help you set up a conversion. 

Regards,
Håvard

From my personal email since I've gone on holiday. 

> On 31 Jul 2016, at 05:05, Paul Tyson <[email protected]> wrote:
> 
> Håvard, 3 things:
> 
> 1. You should announce this to the RDF and XML Interoperability W3C
> community group [1], and pursue this discussion there.
> 
> 2. Were you aware of the early work on RDF schema for XML infoset? [2]
> 
> 3. I agree with Martynas that XSLT is often a better way to specify and
> run transformations on XML. In particular, it is as easy as falling off
> a log to specify a transform to the above-referenced infoset schema.
> 
> Regards,
> --Paul
> 
> [1] https://www.w3.org/community/rax/
> [2] https://www.w3.org/TR/2001/NOTE-xml-infoset-rdfs-20010406
> 
> 
>> On Fri, 2016-07-29 at 08:28 +0000, Håvard Mikkelsen Ottestad wrote:
>> Hi,
>> 
>> I just wanted to give some publicity to a library I have worked on for some 
>> time. An XML to RDF Java library (open source / apache 2) that’s compatible 
>> with  Jena.
>> 
>> It’s blazingly fast and highly configurable. Available on GitHub 
>> https://github.com/AcandoNorway/XmlToRdf and on Maven 
>> http://mvnrepository.com/artifact/no.acando/xmltordf
>> 
>> Regards,
>> Håvard M. Ottestad
> 
> 

Reply via email to