Hi, Thank you Paul and everyone else who has shown interest in my XmlToRdf library.
I will definitely contact that w3c group you mention. I realize my email got a bit long, so tldr; XmlToRdf streams XML and needs very little ram. It handles common transforms with configuration and also supports SPARQL for transforming. It also handles mixed content and can build an IRI for an element as a composite ID based off its children and attributes. I have seen various ways of converting XML to RDF, from XSLT to Topbraid Composer. When building XmlToRdf I sought to make something as fast and memory efficient as possible. This is why XmlToRdf can convert 100 MB file in just a few seconds using 50 MB RAM (mathematically it requires about O(log n) memory and takes O(n) time). Initially XmlToRdf was a generic convertor that used SPARQL update and construct to transform the data. It still supports this approach, and even supplies a helper class to chain the transforms (PostProcessing). However, after using the library internally at Acando for a while it became obvious that a number of patterns kept repeating. One was about handling SimpleType elements as predicates with literals. Or being able to rename an element, and also to insert a predicate between two elements. Transforming with SPARQL can be a bit time consuming, not just having to write the query, but also that if you need multiple queries to rename elements, then it's going to take longer than renaming them during the initial XML to RDF conversion using a HashMap. We also ran into a number of complex cases. One was handling mixed content. Mixed content is where an element can contain both text and other elements. Our case was with handling markup of text inside an element. XmlToRdf handles mixed content by giving you the raw text, all the elements, and also an RDF list of the text and elements together. Another complex case was handing composite identifiers. Where you want to build the IRI for an element based on attributes or child elements or both. Since XmlToRdf uses a SAX streamer to do it's conversion (for speed and being able to handle huge XML files) it will buffer the elements until it can resolve (build) the IRI for the parent element. I would gladly help anyone to get started with the XmlToRdf library, just send me an email with your use case and I can help you set up a conversion. Regards, Håvard From my personal email since I've gone on holiday. > On 31 Jul 2016, at 05:05, Paul Tyson <[email protected]> wrote: > > Håvard, 3 things: > > 1. You should announce this to the RDF and XML Interoperability W3C > community group [1], and pursue this discussion there. > > 2. Were you aware of the early work on RDF schema for XML infoset? [2] > > 3. I agree with Martynas that XSLT is often a better way to specify and > run transformations on XML. In particular, it is as easy as falling off > a log to specify a transform to the above-referenced infoset schema. > > Regards, > --Paul > > [1] https://www.w3.org/community/rax/ > [2] https://www.w3.org/TR/2001/NOTE-xml-infoset-rdfs-20010406 > > >> On Fri, 2016-07-29 at 08:28 +0000, Håvard Mikkelsen Ottestad wrote: >> Hi, >> >> I just wanted to give some publicity to a library I have worked on for some >> time. An XML to RDF Java library (open source / apache 2) that’s compatible >> with Jena. >> >> It’s blazingly fast and highly configurable. Available on GitHub >> https://github.com/AcandoNorway/XmlToRdf and on Maven >> http://mvnrepository.com/artifact/no.acando/xmltordf >> >> Regards, >> Håvard M. Ottestad > >
