Re: StAX parsing error when querying DBpedia

Jeremy Debattista Wed, 13 May 2015 06:39:10 -0700

Great thanks.

I also managed to change the content-type. I was using the most generic 
QueryExecution class instead of QueryEngineHTTP. That made the trick.



FYI:

                QueryEngineHTTP qexec = (QueryEngineHTTP) 
QueryExecutionFactory.sparqlService(uri, query);
                try {
                        
qexec.setSelectContentType(WebContent.contentTypeResultsJSON);
                        ResultSet results = qexec.execSelect();
                        return results;
                } 


On 13 May 2015, at 15:24, Andy Seaborne <[email protected]> wrote:

> Workaround:
> 
> Add jena-text to the dependencies
> 
> :-)
> 
> jena-text
>   depends on solr-solrj
>     depends on org.codehaus.woodstox:wstx-asl
> 
>       Andy
> 
> On 13/05/15 13:59, Jeremy Debattista wrote:
>> Hi Andy,
>> 
>> Thanks for your reply, but I didn’t really get how you set the input stream 
>> to 1.0. Unfortunately, in Jena we cannot use: 
>> application/sparql-results+json as a content type since a select query has a 
>> preset content type in the QueryEngineHTTP class.
>> 
>> Cheers,
>> Jer
>> On 13 May 2015, at 14:38, Andy Seaborne <[email protected]> wrote:
>> 
>>> The DBpedia response has a processing directive:
>>> 
>>> <?xml version="1.1" ?>
>>> 
>>> not XML "1.0" (or default)  Setting it to "1.0" and I worked for me.  I 
>>> don't see any XML 1.1 feature being used.
>>> 
>>> It fails because there is no XML 1.1 parser registered.
>>> 
>>> ((The results aren't schema conforming anyway  distinct= and ordered= 
>>> aren't in the standard, not that it is checked))
>>> 
>>> There aren't many XML 1.1 parsers about and the uptake of XML 1.1 is low. 
>>> There are issues due to the strictness for character sets in XML parsing - 
>>> invalid documents becoming valid is a big deal of that document is a 
>>> business process document i.e. $$$ is involved and its a security issue.
>>> 
>>> Anyone know how to ignore the processing directive and have Jena setup the 
>>> parser factory anyway?
>>> 
>>> Workaround: use a different like the JSON format.
>>> 
>>>     Andy
>>> 
>>> 
>>> On 13/05/15 12:27, Jeremy Debattista wrote:
>>>> Hi Rob,
>>>> 
>>>> Yes that is what I suspect as well, even though when I use a curl function 
>>>> with content negotiation [1], the returned results look good (and well 
>>>> formed). Anyway, this is the complete error stack:
>>>> 
>>>> com.hp.hpl.jena.sparql.resultset.ResultSetException: Failed when 
>>>> initializing the StAX parsing engine
>>>>    at 
>>>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX.<init>(XMLInputStAX.java:119)
>>>>    at com.hp.hpl.jena.sparql.resultset.XMLInput.make(XMLInput.java:73)
>>>>    at com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:42)
>>>>    at com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:37)
>>>>    at 
>>>> com.hp.hpl.jena.query.ResultSetFactory.fromXML(ResultSetFactory.java:312)
>>>>    at 
>>>> com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:372)
>>>>    at 
>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.executeQuery(SPARQLHandler.java:41)
>>>>    at 
>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.getLabelFromNode(SPARQLHandler.java:80)
>>>>    at 
>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.<init>(RDFClass.java:62)
>>>>    at 
>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RDFClass.java:228)
>>>>    at 
>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RDFClass.java:222)
>>>>    at com.servlet.routes.BuilderRoute.getProperties(BuilderRoute.java:172)
>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>    at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>    at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>    at java.lang.reflect.Method.invoke(Method.java:606)
>>>> 
>>>> Cheers,
>>>> Jeremy
>>>> 
>>>> 
>>>> [1] curl -H "Accept: application/sparql-results+xml" -g 
>>>> "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELECT+distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClass%7D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+%3Flabel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C%5Cbact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A”
>>>> 
>>>> On 13 May 2015, at 12:32, Rob Vesse <[email protected]> wrote:
>>>> 
>>>>> What is the error message you get?
>>>>> 
>>>>> It is not unheard of for Virtuoso (the software that powers DBPedia) to
>>>>> produce bad output particularly if the data has not been appropriately
>>>>> sanitised so I would suspect Virtuoso before suspecting Jena in a case
>>>>> like this
>>>>> 
>>>>> Rob
>>>>> 
>>>>> On 13/05/2015 10:16, "Jeremy Debattista" <[email protected]> wrote:
>>>>> 
>>>>>> Dear All,
>>>>>> 
>>>>>> I am trying to query the DBpedia SPARQL endpoint using the
>>>>>> QueryExecutionFactory sparqlService and execSelect(), but I’m given the
>>>>>> following error: com.hp.hpl.jena.sparql.resultset.ResultSetException:
>>>>>> Failed when initializing the StAX parsing engine
>>>>>> 
>>>>>> The query in question is
>>>>>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX
>>>>>> rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX
>>>>>> owl:<http://www.w3.org/2002/07/owl#>  SELECT distinct ?class ?label
>>>>>> WHERE { {?class rdf:type owl:Class} UNION {?class rdf:type rdfs:Class}.
>>>>>> ?class rdfs:label ?label.   FILTER(bound(?label)  && REGEX(?label,
>>>>>> "\\bact","i"))} ORDER BY ?class
>>>>>> 
>>>>>> which gives a result in dbpedia sparql web interface [1].
>>>>>> 
>>>>>> The code in question is the following:
>>>>>> 
>>>>>> public static ResultSet executeQuery(String uri, String queryString) {
>>>>>>  Query query = QueryFactory.create(queryString);
>>>>>>  QueryExecution qexec = QueryExecutionFactory.sparqlService(uri, query);
>>>>>>  try {
>>>>>>          ResultSet results = qexec.execSelect();
>>>>>>          return results;
>>>>>>  } finally {
>>>>>> 
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> After debugging, the problem seems to be related to how the XML parser is
>>>>>> reading the stream input. Would you have any other idea how I can go
>>>>>> around it?
>>>>>> 
>>>>>> Best Regards,
>>>>>> Jeremy
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [1]
>>>>>> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query
>>>>>> =PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23
>>>>>> %3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3
>>>>>> E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELECT+
>>>>>> distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClass%7
>>>>>> D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+%3Fl
>>>>>> abel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C%5C
>>>>>> bact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A&format=text%2Fhtml&time
>>>>>> out=30000&debug=on
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: StAX parsing error when querying DBpedia

Reply via email to