Hello,

when I run the SPARQL query on the DBpedia endpoint http://dbpedia.org/sparql

CONSTRUCT {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}
WHERE {
<http://dbpedia.org/resource/Leipzig> ?p0 ?o0.
}


by using the code


String query = "CONSTRUCT {\n" +
                "<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
                "?o0 ?p1 ?o1.\n" +
                "}\n" +
                "WHERE {\n" +
                "<http://dbpedia.org/resource/Trey_Parker> ?p0 ?o0.\n" +
                "OPTIONAL{\n" +
                "?o0 ?p1 ?o1.\n" +
                "}}";
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP qe = new com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP("http://dbpedia.org/sparql";, query);
qe.setDefaultGraphURIs(Collections.singletonList("http://dbpedia.org";));
Model model = qe.execConstruct();
qe.close();


I get an exception thrown by the Turtle parser:

11:48:30,550 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC. 11:48:30,553 ErrorHandlerFactory$ErrorLogger - [line: 263, col: 45] Bad IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้กำกับภาพยนตร์ชาว อเมริกัน> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO 11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad IRI: <http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC. 11:48:30,557 ErrorHandlerFactory$ErrorLogger - [line: 288, col: 45] Bad IRI: <http://zh_min_nan.dbpedia.org/resource/Category:Bí-kok_tiān-iáⁿ_tō-ián> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO 11:48:30,574 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์ ชาวอเมริกัน> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC. 11:48:30,575 ErrorHandlerFactory$ErrorLogger - [line: 440, col: 13] Bad IRI: <http://th.dbpedia.org/resource/หมวดหมู่:ผู้อำนวยการสร้างรายการ โทรทัศน์ ชาวอเมริกัน> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO 11:48:30,584 ErrorHandlerFactory$ErrorLogger - [line: 513, col: 24] Unknown char: «(171;0x00AB) Exception in thread "main" org.apache.jena.riot.RiotException: [line: 513, col: 24] Unknown char: «(171;0x00AB) at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136) at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
at org.apache.jena.riot.lang.LangTurtleBase.triplesNode(LangTurtleBase.java:368) at org.apache.jena.riot.lang.LangTurtleBase.objectList(LangTurtleBase.java:350) at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:288) at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:281) at org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:250) at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:191) at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:44) at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:90)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:169)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:255)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:229)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:219)
at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execModel(QueryEngineHTTP.java:431) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:387) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execConstruct(QueryEngineHTTP.java:382)

When I force the QueryEngineHTTP to request RDFXML instead of TURTLE which is somehow the default setting it works without exception.

My questions are:

1. Is it a bug in Virtuoso and a wrong character is returned or is it some problem within the Turtle parser? 2. Is there a way to change the accept format from outside the QueryEngineHTTP class? 3. Is there a way to ignore such kind of triples such that I get some warning but the parser does not terminate with an error?

Thanks in advance.

Kind regards,
Lorenz

--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center

Reply via email to