Re: XML to RDF

Paul Houle Sat, 30 Jul 2016 16:00:02 -0700

I think for my purposes "bullshit" is the accidental complexity that Fred
Brooks talks about in the mythical man month.  The core of RDF is pretty
good,  but there is a lot of "BS" in the surrounding standards,  tools,
 etc.

* For instance,  somebody writes
solution.get("varname").asLiteral().getByte() when working with a
QuerySolution in Jena and there are a few ways they could end up with an
N.P.E. that is not well described if that is what you write.  So maybe
somebody writes something that is longer and error prone in some other way.
[a Jena problem]
* There is a CSV and a TSV format for writing SPARQL result sets and if you
don't read the specification carefully you might think they are similar and
just have different separators,  but if you do you find that one of them is
lossy and the other one is lossless and worse yet,  you might not learn
this until you've got some code in place are afraid to change. [a standards
problem]
* The distinction between Literal-typed and Resource-typed properties in
OWL has some bad consequences,  for instance it is just one more reason why
Protoge looks like the cockpit on one of the original 747-100's (as opposed
to a recent 747 which has a glass cockpit)

All of the above complexities pretty clearly are cases where additional
complexity forces a decision you can make right or wrong.

Whether programmers are particularly talented or not,  they end up making
thousands and thousands of decisions -- for instance a real life business
application could very well have a thousand or so queries in it so multiply
all of the little cuts and confusions by a factor like that and you are
talking about a large human and economic cost.

When you get closer to the core of RDF standards you occasionally run into
other problems,  such as the "no validation of predicate names" that I
think is the #1 people new users abandon SPARQL.  Here's a case where the
*simplicity* of the underlying standard means that the process of writing
SPARQL problems are error prone,  but that's a solvable problem by putting
complexity into the query editor and possibly an error message generator.

You've got to be afraid of complexity however because it is so easy to get
into the "old lady who swallowed a mouse" problem.

On Fri, Jul 29, 2016 at 6:12 PM, Martynas Jusevičius <[email protected]>
wrote:

> Paul,
>
> I meant non-RDF XML to RDF/XML (triples) or to TriX (quads). As soon as we
> have RDF, SPARQL from there (CONSTRUCT for transformations).
>
> What is the RDF bullshit?
>
> On Fri, 29 Jul 2016 at 23:17, Paul Houle <[email protected]> wrote:
>
> > @Martynas,  With XSLT are you talking about using XSLT to generate an
> > XML/RDF document?  Or punching in some UDFs that generate triples?
> >
> > One advantage of the low-to-no-configuration route is that you can start
> > doing SPARQL queries on the data on an exploratory basis without having
> to
> > do much or any work on setting up a transformation.
> >
> > If you translate XML to RDF in a transparent way,  you will also find
> that
> > many of the pattern matching things you could do with XPath,  XQuery or
> > XSLT can be done in ways that are much more transparent with SPARQL,
> Jena
> > rules,  etc.
> >
> > XML has a quite a bit of funkiness in it that leads to non-essential
> > complexity such as asymmetries in the handling of attributes and elements
> > such as attributes almost always being the default namespace.
> Individually
> > these are not so bad but when you add them up they introduce a large
> number
> > of latent errors and cognitive load on the part of users and programmers.
> >
> > A big difference also has to do with the handling of ordering of things.
> > For instance in an XML document the order of elements matters,  whereas
> in
> > RDF world the order of things doesn't matter unless you actually use the
> > list constructions.  Sometimes in the problem space the order does not
> > matter and then the "order-matters" semantics implicit to XML leads to a
> > conceptual gap that causes lots of little practical problems.
> >
> > With the "low-to-no-configuration" route you also can use one set of
> tools
> > (SPARQL,  Jena Rules) for XML,  JSON,  Relational,  YAML,  Stuff imported
> > directly from Java,  etc.
> >
> > My take is that programmers already need to learn too many things and
> know
> > too many different tools and we are too proud to admit we have cognitive
> > limits -- if we add RDF to a system we are adding the (unavoidable)
> > bullshit that comes with RDF and unless we can subtract at least as much
> > bullshit from the system from the benefits of RDF,  RDF is part of the
> > problem and not part of the solution.
> >
> > On Fri, Jul 29, 2016 at 4:06 PM, Martynas Jusevičius <
> > [email protected]>
> > wrote:
> >
> > > Hey,
> > >
> > > I am all for RDF conversion tools, but I think this would be much more
> > > reusable and portable if done as an XSLT stylesheet -- and probably
> > > more readable, too.
> > >
> > > I don't think there is a better tool than XSLT (2.0) when it comes to
> > > XML conversion.
> > >
> > > On Fri, Jul 29, 2016 at 10:28 AM, Håvard Mikkelsen Ottestad
> > > <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > I just wanted to give some publicity to a library I have worked on
> for
> > > some time. An XML to RDF Java library (open source / apache 2) that’s
> > > compatible with  Jena.
> > > >
> > > > It’s blazingly fast and highly configurable. Available on GitHub
> > > https://github.com/AcandoNorway/XmlToRdf and on Maven
> > > http://mvnrepository.com/artifact/no.acando/xmltordf
> > > >
> > > > Regards,
> > > > Håvard M. Ottestad
> > >
> >
> >
> >
> > --
> > Paul Houle
> >
> > *Applying Schemas for Natural Language Processing, Distributed Systems,
> > Classification and Text Mining and Data Lakes*
> >
> > (607) 539 6254    paul.houle on Skype   [email protected]
> >
> > :BaseKB -- Query Freebase Data With SPARQL
> > http://basekb.com/gold/
> >
> > Legal Entity Identifier Lookup
> > https://legalentityidentifier.info/lei/lookup/
> > <http://legalentityidentifier.info/lei/lookup/>
> >
> > Join our Data Lakes group on LinkedIn
> > https://www.linkedin.com/grp/home?gid=8267275
> >
>

-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: XML to RDF

Reply via email to