It turns out that it is not so simple :-)

Where did this XML come from?

tl;dr
JENA-1071 : https://issues.apache.org/jira/browse/JENA-1071

Xerces support XML 1.0 - 4th edition which does not include codepoint U+0370.

Workarounds
* Turn off the warning.
* Use rdf:about, not rdf:ID.

The long story:

http://www.w3.org/TR/xml/#NT-NameStartChar is a reference to "XML 1.0 (Fifth edition)" and even that is only Unicode 5.0.0. Greek Heta [Ͱ], or U+0370 was added Unicode version 5.1 but is in the codepoint ranges for the 5th edition.

"XML 1.0 (Fourth Edition)" does not include U+0370.

The Xerces 2.11.0 implements XML 1.0 Fourth Edition (and you are using the earlier 2.10.0 - so simply upgrading will not help here though a good idea for lots of other reasons).

The XML parser in the Java8 JDK (which happens to be fork of Xerces from way back (2.7.1) also seems to be 4th edition. IBM Java7 is a fork of 2.9.

Now both Xerces 2.11.0 and Java8 JDK do happen to support a check for XML11 chars where XML11Char.isXML11ValidNCName. That is not XML 1.1 support.

        Andy

[Ͱ]
https://en.wikipedia.org/wiki/Heta

Reply via email to