Hello,
when I run the SPARQL query on the DBpedia endpoint
http://dbpedia.org/sparql
CONSTRUCT {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}
WHERE {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}
by using the code
String query = "CONSTRUCT {\n" +
"<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
"?o0 ?p1 ?o1.\n" +
"}\n" +
"WHERE {\n" +
"<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
"OPTIONAL{\n" +
"?o0 ?p1 ?o1.\n" +
"}}";
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP qe = new
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP("http://dbpedia.org/sparql",
query);
qe.setDefaultGraphURIs(Collections.singletonList("http://dbpedia.org"));
Model model = qe.execConstruct();
qe.close();
I get an exception thrown by the Turtle parser:
11:48:30,550 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน>
Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
11:48:30,553 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน>
Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad
IRI:
<http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián>
Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad
IRI:
<http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián>
Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
11:48:30,574 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์
ชาวอเมริกัน> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal
Form KC.
11:48:30,575 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad
IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์
ชาวอเมริกัน> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
11:48:30,584 ErrorHandlerFactory$ErrorLogger - [line: 513, col: 24]
Unknown char: «(171;0x00AB)
Exception in thread "main" org.apache.jena.riot.RiotException: [line:
513, col: 24] Unknown char: «(171;0x00AB)
at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
at
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
at
org.apache.jena.riot.lang.LangTurtleBase.triplesNode(LangTurtleBase.java:368)
at
org.apache.jena.riot.lang.LangTurtleBase.objectList(LangTurtleBase.java:350)
at
org.apache.jena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:288)
at
org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:281)
at
org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:250)
at
org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:191)
at
org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:44)
at
org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:90)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:169)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:255)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:229)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:219)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execModel(QueryEngineHTTP.java:431)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:387)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:382)
When I force the QueryEngineHTTP to request RDFXML instead of TURTLE
which is somehow the default setting it works without exception.
My questions are:
1. Is it a bug in Virtuoso and a wrong character is returned or is it
some problem within the Turtle parser?
2. Is there a way to change the accept format from outside the
QueryEngineHTTP class?
3. Is there a way to ignore such kind of triples such that I get some
warning but the parser does not terminate with an error?
Thanks in advance.
Kind regards,
Lorenz
--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center