Hi,

thanks for attending to this, Andy.

Regards
Philipp


On Tue, 17 Nov 2015 20:02:43 +0000 UTC,
Andy Seaborne <[email protected]> wrote:
> It turns out that it is not so simple :-)
> 
> Where did this XML come from?
> 
> tl;dr
> JENA-1071 : https://issues.apache.org/jira/browse/JENA-1071
> 
> Xerces support XML 1.0 - 4th edition which does not include codepoint
> U+0370.
> 
> Workarounds
> * Turn off the warning.
> * Use rdf:about, not rdf:ID.
> 
> The long story:
> 
> http://www.w3.org/TR/xml/#NT-NameStartChar is a reference to "XML 1.0 (Fifth
> edition)" and even that is only Unicode 5.0.0.  Greek Heta [Ͱ], or U+0370
> was added Unicode version 5.1 but is in the codepoint ranges for the 5th
> edition.
> 
> "XML 1.0 (Fourth Edition)" does not include U+0370.
> 
> The Xerces 2.11.0 implements XML 1.0 Fourth Edition (and you are using the
> earlier 2.10.0 - so simply upgrading will not help here though a good idea
> for lots of other reasons).
> 
> The XML parser in the Java8 JDK (which happens to be fork of Xerces from way
> back (2.7.1) also seems to be 4th edition.  IBM Java7 is a fork of 2.9.
> 
> Now both Xerces 2.11.0 and Java8 JDK do happen to support a check for XML11
> chars where XML11Char.isXML11ValidNCName.  That is not XML 1.1 support.
> 
>       Andy
> 
> [Ͱ]
> https://en.wikipedia.org/wiki/Heta
> 

Reply via email to