Re: Turtle parser fails on CONSTRUCT query result

Andy Seaborne Tue, 27 Jan 2015 03:39:20 -0800

Comments inline and at the end ...

On 27/01/15 10:57, Lorenz Bühmann wrote:

Hello,


when I run the SPARQL query on the DBpedia endpoint
http://dbpedia.org/sparql

CONSTRUCT {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}
WHERE {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}


by using the code


String query = "CONSTRUCT {\n" +
                 "<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
                 "?o0 ?p1 ?o1.\n" +
                 "}\n" +
                 "WHERE {\n" +
                 "<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
                 "OPTIONAL{\n" +
                 "?o0 ?p1 ?o1.\n" +
                 "}}";
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP qe = new
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP("http://dbpedia.org/sparql";,
query);
qe.setDefaultGraphURIs(Collections.singletonList("http://dbpedia.org";));
Model model = qe.execConstruct();
qe.close();


I get an exception thrown by the Turtle parser:

11:48:30,550 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน>
Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.


This is a warning - the parser emits the data and continues ...

(I'm somewhat tempted to turn the NF tests off - while strictly correct,few people worry or understand NF - feedback welcome).

11:48:30,553 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน>
Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad
IRI:
<http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián>
Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad
IRI:
<http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián>
Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
11:48:30,574 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์
ชาวอเมริกัน> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal
Form KC.
11:48:30,575 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์
ชาวอเมริกัน> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO


and now we have a real error.

What's line 513? (You can get the response by using curl or wget).

11:48:30,584 ErrorHandlerFactory$ErrorLogger - [line: 513, col: 24]
Unknown char: «(171;0x00AB)

The actual error is from looking for a new turtle token and does ntofind a start-of-token marker like < or " or a digit. So it assumes aprefix name (which does not start with an identifing character)

It might be badly written data (some unescaped significant characterearlier in the triple). It's structural problem with the data sent back.

(Hmm - the stack trace does not seem to quite agree with the currentcodebase. What version are you running?)

Exception in thread "main" org.apache.jena.riot.RiotException: [line:
513, col: 24] Unknown char: «(171;0x00AB)
     at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)

     at
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
     at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
     at
org.apache.jena.riot.lang.LangTurtleBase.triplesNode(LangTurtleBase.java:368)

     at
org.apache.jena.riot.lang.LangTurtleBase.objectList(LangTurtleBase.java:350)

     at
org.apache.jena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:288)

     at
org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:281)

     at
org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:250)
     at
org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:191)

     at
org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:44)
     at
org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:90)
     at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
     at
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:169)

     at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
     at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:255)
     at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:229)
     at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:219)
     at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execModel(QueryEngineHTTP.java:431)

     at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:387)

     at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:382)


When I force the QueryEngineHTTP to request RDFXML instead of TURTLE
which is somehow the default setting it works without exception.

My questions are:

1. Is it a bug in Virtuoso and a wrong character is returned or is it
some problem within the Turtle parser?

For the NFC warnings, mostly is that the data is not NFC, not theVirtuoso engine messing with it.

2. Is there a way to change the accept format from outside the
QueryEngineHTTP class?


QueryEngineHTTP.setModelContentType(String)

3. Is there a way to ignore such kind of triples such that I get some
warning but the parser does not terminate with an error?

Not really (see above about surpessing the check) but you can configureyour logging to not output anything.

Thanks in advance.

Kind regards,
Lorenz


        Andy

Re: Turtle parser fails on CONSTRUCT query result

Reply via email to