On 31/07/12 15:57, Elli Schwarz wrote:
Sorry to bring this up again, but I'm wondering if anyone had a
chance to look into this problem. I'm enclosing the test file again
which reproduces the problem in a complete, minimal fashion.
Vacation ...
Thank you for your help! -Elli
Apparently the "h" is a latin U+0068, but it is combined with a
U+0327 code to get the curve under the "h". My understanding is that
the Unicode Normal Form C prefers one character (a combination
character) instead of the two codes for the one character.
It does prefer that form.
It does not display with the cedilla under the "h" for me though. The
combining cedilla is under the t consistently, including writing out the
\u escapes from java.
However, the bug is nothing to do with the data except that it provokes
a warning. An error handler is now correctly passed in and a warning on
getModel(..) comes out.
The bug was in DatasetGraphAccessorHTTP on the client side (still oddly
in the Fuseki jar).
Andy
----- Forwarded Message ----- *From:* Elli Schwarz
<[email protected]> *To:* "[email protected]"
<[email protected]> *Sent:* Friday, July 20, 2012 8:40 AM
*Subject:* Re: NullPointerException when writing Unicode data
Andy,
Enclosed is a simple test that reproduces the error. I'm using Jena
2.7.2, ARQ 2.9.2, and Fuseki 0.2.3. The data is UTF-8. Here is the
exception:
Exception in thread "main" java.lang.NullPointerException at
org.openjena.riot.lang.LangRDFXML$ErrorHandlerBridge.warning(LangRDFXML.java:199)
at
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.warning(ARPSaxErrorHandler.java:46)
at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:203)
at
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:185)
at
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:180)
at
com.hp.hpl.jena.rdf.arp.impl.ParserSupport.warning(ParserSupport.java:202)
at
com.hp.hpl.jena.rdf.arp.impl.ParserSupport.checkString(ParserSupport.java:113)
at
com.hp.hpl.jena.rdf.arp.impl.ARPDatatypeLiteral.<init>(ARPDatatypeLiteral.java:37)
at
com.hp.hpl.jena.rdf.arp.states.WantTypedLiteral.endElement(WantTypedLiteral.java:46)
at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.endElement(XMLHandler.java:133)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source) at
org.apache.xerces.impl.XMLNamespaceBinder.handleEndElement(Unknown
Source) at
org.apache.xerces.impl.XMLNamespaceBinder.endElement(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source) at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source) at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at
com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:155)
at com.hp.hpl.jena.rdf.arp.ARP.load(ARP.java:120)
at org.openjena.riot.lang.LangRDFXML.parse(LangRDFXML.java:105) at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.readGraph(DatasetGraphAccessorHTTP.java:307)
at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.exec(DatasetGraphAccessorHTTP.java:277)
at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.doGet(DatasetGraphAccessorHTTP.java:82)
at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.httpGet(DatasetGraphAccessorHTTP.java:76)
at
org.apache.jena.fuseki.http.DatasetAdapter.getModel(DatasetAdapter.java:47)
at NPETest.main(NPETest.java:24)
Thank you for your help and your hard work building and maintaining
Jena!
-Elli
------------------------------------------------------------------------
*From:* Andy Seaborne <[email protected]>
*To:* [email protected] *Sent:* Thursday, July 19, 2012 4:53 PM
*Subject:* Re: NullPointerException when writing Unicode data
On 19/07/12 18:11, Elli Schwarz wrote:
Andy,
The data in question is this:
I was looking for a complete sample of RDF/XML:
What is the charset?
Arāḑ Muḩtallah
Apparently the "h" is a latin U+0068, but it is combined with a
U+0327 code to get the curve under the "h". My understanding is that
the Unicode Normal Form C prefers one character (a combination
character) instead of the two codes for the one character.
Regardless of the data, I shouldn't expect a NullPointerException,
I
would expect a warning. The only way I found the Unicode warning was
through stepping through the code in my debugger to figure out what
was causing the problem.
No, it shouldn't but the description so far leaves me with a bit of
guessing as to the setup. A complete, minimal example please.
Andy
Thanks,
Elli
________________________________ From: Andy Seaborne
<[email protected] <mailto:[email protected]>> To:
[email protected] <mailto:[email protected]> Sent:
Thursday, July 19, 2012 1:01 PM Subject: Re: NullPointerException
when writing Unicode data
(switch to the users list)
On 19/07/12 17:53, Elli Schwarz wrote:
Hello,
I am attempting to write a graph stored in Fuseki out as
RDF/XML,
and I get a NullPointerException from line 199 of LangRDFXML. It
looks like the variable errorHandler is null.
There is actually a warning that "... {W131} String not in
Unicode
Normal Form C: ..." that is coming from Jena's XMLHandler, but
instead of this being propagated back as a warning it is throwing a
NullPointerException.
It seems that it isn't a fatal error, so no exception at all
should
be thrown, just a warning should be logged, so I'm guessing this is a
bug?
I'm using Jena 2.7.2, ARQ 2.9.2, and I'm connecting to a Fuseki
0.2.3 back end (this error occurs when I do ds.getModel(modelName)
where ds is a Fuseki DataAccessor.
Thank you! -Elli
What does the data look like?
Andy