Re: StAX parsing error when querying DBpedia

Jeremy Debattista Wed, 13 May 2015 06:02:29 -0700

Hi Andy,

Thanks for your reply, but I didn’t really get how you set the input stream to 
1.0. Unfortunately, in Jena we cannot use: application/sparql-results+json as a 
content type since a select query has a preset content type in the 
QueryEngineHTTP class.


Cheers,
Jer
On 13 May 2015, at 14:38, Andy Seaborne <[email protected]> wrote:

> The DBpedia response has a processing directive:
> 
> <?xml version="1.1" ?>
> 
> not XML "1.0" (or default)  Setting it to "1.0" and I worked for me.  I don't 
> see any XML 1.1 feature being used.
> 
> It fails because there is no XML 1.1 parser registered.
> 
> ((The results aren't schema conforming anyway  distinct= and ordered= aren't 
> in the standard, not that it is checked))
> 
> There aren't many XML 1.1 parsers about and the uptake of XML 1.1 is low. 
> There are issues due to the strictness for character sets in XML parsing - 
> invalid documents becoming valid is a big deal of that document is a business 
> process document i.e. $$$ is involved and its a security issue.
> 
> Anyone know how to ignore the processing directive and have Jena setup the 
> parser factory anyway?
> 
> Workaround: use a different like the JSON format.
> 
>       Andy
> 
> 
> On 13/05/15 12:27, Jeremy Debattista wrote:
>> Hi Rob,
>> 
>> Yes that is what I suspect as well, even though when I use a curl function 
>> with content negotiation [1], the returned results look good (and well 
>> formed). Anyway, this is the complete error stack:
>> 
>> com.hp.hpl.jena.sparql.resultset.ResultSetException: Failed when 
>> initializing the StAX parsing engine
>>      at 
>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX.<init>(XMLInputStAX.java:119)
>>      at com.hp.hpl.jena.sparql.resultset.XMLInput.make(XMLInput.java:73)
>>      at com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:42)
>>      at com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:37)
>>      at 
>> com.hp.hpl.jena.query.ResultSetFactory.fromXML(ResultSetFactory.java:312)
>>      at 
>> com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:372)
>>      at 
>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.executeQuery(SPARQLHandler.java:41)
>>      at 
>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.getLabelFromNode(SPARQLHandler.java:80)
>>      at 
>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.<init>(RDFClass.java:62)
>>      at 
>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RDFClass.java:228)
>>      at 
>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RDFClass.java:222)
>>      at com.servlet.routes.BuilderRoute.getProperties(BuilderRoute.java:172)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>      at java.lang.reflect.Method.invoke(Method.java:606)
>> 
>> Cheers,
>> Jeremy
>> 
>> 
>> [1] curl -H "Accept: application/sparql-results+xml" -g 
>> "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELECT+distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClass%7D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+%3Flabel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C%5Cbact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A”
>> 
>> On 13 May 2015, at 12:32, Rob Vesse <[email protected]> wrote:
>> 
>>> What is the error message you get?
>>> 
>>> It is not unheard of for Virtuoso (the software that powers DBPedia) to
>>> produce bad output particularly if the data has not been appropriately
>>> sanitised so I would suspect Virtuoso before suspecting Jena in a case
>>> like this
>>> 
>>> Rob
>>> 
>>> On 13/05/2015 10:16, "Jeremy Debattista" <[email protected]> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I am trying to query the DBpedia SPARQL endpoint using the
>>>> QueryExecutionFactory sparqlService and execSelect(), but I’m given the
>>>> following error: com.hp.hpl.jena.sparql.resultset.ResultSetException:
>>>> Failed when initializing the StAX parsing engine
>>>> 
>>>> The query in question is
>>>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX
>>>> rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX
>>>> owl:<http://www.w3.org/2002/07/owl#>  SELECT distinct ?class ?label
>>>> WHERE { {?class rdf:type owl:Class} UNION {?class rdf:type rdfs:Class}.
>>>> ?class rdfs:label ?label.   FILTER(bound(?label)  && REGEX(?label,
>>>> "\\bact","i"))} ORDER BY ?class
>>>> 
>>>> which gives a result in dbpedia sparql web interface [1].
>>>> 
>>>> The code in question is the following:
>>>> 
>>>> public static ResultSet executeQuery(String uri, String queryString) {
>>>>    Query query = QueryFactory.create(queryString);
>>>>    QueryExecution qexec = QueryExecutionFactory.sparqlService(uri, query);
>>>>    try {
>>>>            ResultSet results = qexec.execSelect();
>>>>            return results;
>>>>    } finally {
>>>> 
>>>>    }
>>>> }
>>>> 
>>>> After debugging, the problem seems to be related to how the XML parser is
>>>> reading the stream input. Would you have any other idea how I can go
>>>> around it?
>>>> 
>>>> Best Regards,
>>>> Jeremy
>>>> 
>>>> 
>>>> 
>>>> [1]
>>>> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query
>>>> =PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23
>>>> %3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3
>>>> E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELECT+
>>>> distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClass%7
>>>> D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+%3Fl
>>>> abel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C%5C
>>>> bact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A&format=text%2Fhtml&time
>>>> out=30000&debug=on
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>

Re: StAX parsing error when querying DBpedia

Reply via email to