On 07/10/13 18:32, Tim Harsch wrote:
Hi Andy,
Thanks for your careful explanation. It helps a lot. As an aside, I did
rerun the experiment with
writer.setProperty("allowBadURIs","true");
which allowed the code to produce the following:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://" >
<rdf:Description rdf:nodeID="A0">
<j.0:p rdf:resource="http://o"/>
</rdf:Description>
</rdf:RDF>
I was running down an issue in our stack that I still believe is related to
blank nodes when I concocted the example we are discussing. I made in error in
thinking I could test the blank node handling with the data I created which had
the missing qname in URI issue. I then tried to validate using Rapper, which
is what generated the snippet I provided. Since this was not related to
bnode's the issue becomes much less important for me. However, now that I've
run into this and experienced what our users will experience I do have some
minor concerns that remain. It seems to me the error message could be improved
if some context were provided. If a user were seeing this when serializing a
result set of thousands or millions of statements, I think they would be hard
pressed to find a way to isolate the URI causing the issue.
The current message:
Only well-formed absolute URIrefs can be included in RDF/XML output: <http://>
Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that
is required by the scheme is missing.
doesn't point to the URI that caused the issue or, better yet, the statement.
Perhaps an improved message would look something like:
The statement:
_:b0 <http://p> <http://o>
contains the malformed URI <http://p>. Only well-formed absolute URIrefs can be
included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A
component that is required by the scheme is missing.
It can point to the namespace, for printing, esp RDF/XML-ABBREV,
namespaces are decided separately from processing statement.
What should change is to not call that code in the first place. I think
the "generate a namespace" code is wrong - it should only generate legal
ones.
BaseWriter.xmlnsDecl at a guess.
If you agree this would be a useful enhancement then I could file an RFE and
try to come up with a patch as well.
(some general observations ...)
RFE? What's that for an open source project? !!!
The reality is that what counts is contribution.
Filing JIRA, doing testing etc is great but consider the critical question:
Who is going to do the work?
What motivates them?
If there's a patch, then sure!
The committers and PMC's responsiblity is applying patches, not taking
on RFE's.
Similarly, patches that are require signficant work to integrate will
make slow propgress if any. If you look at projects in the
Hadoop-o-sphere, you'll see this very sharply. They can be quite direct
about this but it's really just a simple matter of resourcing and
motivation.
When the RDF world was smaller (and when HP was backing the work) things
were different. That was then, not now.
Andy
Thanks,
Tim
________________________________
From: Andy Seaborne <[email protected]>
To: [email protected]
Sent: Saturday, October 5, 2013 3:49 AM
Subject: Re: RDF/XML serializer issue with blank nodes
Hi Tim,
It's not to do with blank nodes.
On 05/10/13 02:05, Tim Harsch wrote:
The following gist:
https://gist.github.com/harschware/6835202
shows what I think may be a bug in the RDF/XML serializer. If I'm not mistaken
the output should look like something like this:
<rdf:Description rdf:nodeID="b0">
<ns0:p xmlns:ns0="http://" rdf:resource="http://o"/>
</rdf:Description>
That would be:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
<rdf:Description rdf:nodeID="b0">
<ns0:p xmlns:ns0="http://" rdf:resource="http://o"/>
</rdf:Description>
</rdf:RDF>
(I don't do "something like this"!)
I can file a bug if there isn't one already.
Thanks,
Tim
[[
Exception in thread "main" com.hp.hpl.jena.shared.BadURIException: Only
well-formed absolute URIrefs can be included in RDF/XML output:
<http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that
is required by the scheme is missing.
]]
In Turtle etc, prefix names are defined by string concatenation.
XML has namespace rules. They are different.
http://www.w3.org/TR/REC-xml-names/#sec-namespaces
[[
Definition: An XML namespace is identified by a URI reference [RFC3986];
element and attribute names may be placed in an XML namespace using the
mechanisms described in this specification.
]]
so the namespace name must be a URI.
http://p is a legal URI but to create the property in RDF/XML you need a
qname.
The local part of a qname can't be the empty string (this is a gotcha if
you think Turtle).
Hence you thinking of "http://" but that's not a valid URI.
It would have been better if the code had not tried http:// in the first
place (e.g. http://example/123 gives "InvalidPropertyURIException")
The writer has stopped you creating illegal XML. Some XML parsers will
reject it; there are some very strict parsers. Xerces is a bit more
forgiving.
Jena parses the form above with a warning:
"""
WARN {W124} toAscii failed for namespace URI: <http://>. Bad
Internationalized Domain Name: String index out of range: 0
"""
so
"Be strict in what you output, be generous in what you accept."
Andy