On 31/07/12 15:57, Elli Schwarz wrote:
Sorry to bring this up again, but I'm wondering if anyone had a
chance to look into this problem. I'm enclosing the test file again
which reproduces the problem in a complete, minimal fashion.

Vacation ...


Thank you for your help! -Elli


Apparently the "h" is a latin U+0068, but it is combined with a
U+0327 code to get the curve under the "h". My understanding is that
the Unicode Normal Form C prefers one character (a combination
character)  instead of the two codes for the one character.

It does prefer that form.

It does not display with the cedilla under the "h" for me though. The combining cedilla is under the t consistently, including writing out the \u escapes from java.

However, the bug is nothing to do with the data except that it provokes a warning. An error handler is now correctly passed in and a warning on getModel(..) comes out.

The bug was in DatasetGraphAccessorHTTP on the client side (still oddly in the Fuseki jar).

        Andy


----- Forwarded Message ----- *From:* Elli Schwarz
<[email protected]> *To:* "[email protected]"
<[email protected]> *Sent:* Friday, July 20, 2012 8:40 AM
*Subject:* Re: NullPointerException when writing Unicode data

Andy,

Enclosed is a simple test that reproduces the error. I'm using Jena
2.7.2, ARQ 2.9.2, and Fuseki 0.2.3. The data is UTF-8. Here is the
exception:

Exception in thread "main" java.lang.NullPointerException at
org.openjena.riot.lang.LangRDFXML$ErrorHandlerBridge.warning(LangRDFXML.java:199)




at
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.warning(ARPSaxErrorHandler.java:46)




at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:203)
at
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:185)
at
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:180)
at
com.hp.hpl.jena.rdf.arp.impl.ParserSupport.warning(ParserSupport.java:202)




at
com.hp.hpl.jena.rdf.arp.impl.ParserSupport.checkString(ParserSupport.java:113)




at
com.hp.hpl.jena.rdf.arp.impl.ARPDatatypeLiteral.<init>(ARPDatatypeLiteral.java:37)




at
com.hp.hpl.jena.rdf.arp.states.WantTypedLiteral.endElement(WantTypedLiteral.java:46)




at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.endElement(XMLHandler.java:133)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source) at
org.apache.xerces.impl.XMLNamespaceBinder.handleEndElement(Unknown
Source) at
org.apache.xerces.impl.XMLNamespaceBinder.endElement(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown



Source) at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown



Source) at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown



Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source) at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at
com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:155)




at com.hp.hpl.jena.rdf.arp.ARP.load(ARP.java:120)
at org.openjena.riot.lang.LangRDFXML.parse(LangRDFXML.java:105) at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.readGraph(DatasetGraphAccessorHTTP.java:307)




at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.exec(DatasetGraphAccessorHTTP.java:277)




at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.doGet(DatasetGraphAccessorHTTP.java:82)




at
org.apache.jena.fuseki.http.DatasetGraphAccessorHTTP.httpGet(DatasetGraphAccessorHTTP.java:76)




at
org.apache.jena.fuseki.http.DatasetAdapter.getModel(DatasetAdapter.java:47)




at NPETest.main(NPETest.java:24)

Thank you for your help and your hard work building and maintaining
Jena!

-Elli

------------------------------------------------------------------------




*From:* Andy Seaborne <[email protected]>
*To:* [email protected] *Sent:* Thursday, July 19, 2012 4:53 PM
*Subject:* Re: NullPointerException when writing Unicode data

On 19/07/12 18:11, Elli Schwarz wrote:
Andy,


The data in question is this:

I was looking for a complete sample of RDF/XML:

What is the charset?



Arāḑ Muḩtallah


Apparently the "h" is a latin U+0068, but it is combined with a
U+0327 code to get the curve under the "h". My understanding is that
 the Unicode Normal Form C prefers one character (a combination
character) instead of the two codes for the one character.

Regardless of the data, I shouldn't expect a NullPointerException,
 I
would expect a warning. The only way I found the Unicode warning was
 through stepping through the code in my debugger to figure out what
 was causing the problem.

No, it shouldn't but the description so far leaves me with a bit of
guessing as to the setup.  A complete, minimal example please.

Andy



Thanks,

Elli



________________________________ From: Andy Seaborne
<[email protected] <mailto:[email protected]>> To:
[email protected] <mailto:[email protected]> Sent:
Thursday, July 19, 2012 1:01 PM Subject: Re: NullPointerException
when writing Unicode data

(switch to the users list)

On 19/07/12 17:53, Elli Schwarz wrote:
Hello,

I am attempting to write a graph stored in Fuseki out as
RDF/XML,

and I get a NullPointerException from line 199 of LangRDFXML. It
looks like the variable errorHandler is null.

There is actually a warning that "... {W131} String not in
Unicode
Normal Form C: ..." that is coming from Jena's XMLHandler, but
instead of this being propagated back as a warning it is throwing a
NullPointerException.

It seems that it isn't a fatal error, so no exception at all
should
be thrown, just a warning should be logged, so I'm guessing this is a
bug?

I'm using Jena 2.7.2, ARQ 2.9.2, and I'm connecting to a Fuseki
0.2.3 back end (this error occurs when I do ds.getModel(modelName)
where ds is a Fuseki DataAccessor.

Thank you! -Elli


What does the data look like?

Andy









Reply via email to