Should we still ask DBPedia to switch back to XML 1.0 ?
On Wed, May 13, 2015 at 9:02 PM, Andy Seaborne <[email protected]> wrote: > On 13/05/15 15:27, Rob Vesse wrote: >> >> I assume you'll go ahead and file a bug against Xerces? > > > The issue does not seem to be in Apache Xerces. > > Jena is picking up the JDK XMLStreamReader implementation. > > Xerces does not provide javax.xml.stream.XMLInputFactory and XMLStreamReader > at least its not in META-INF/services > > It means that adding org.codehaus.woodstox:wstx-asl is a valid workaround > always as the default JDK provider is not used unless there are no > XMLInputFactory registered (ServiceLoader). > > Its surprising that the JDK bug is still open as the fix for the JDK looks > small. > > Andy > > >> >> Rob >> >> On 13/05/2015 14:56, "Andy Seaborne" <[email protected]> wrote: >> >>> So far we know: >>> >>> It is a bug in Xerces handling of 1.1 >>> >>> Specifically, an NPE >>> XML11NSDocumentScannerImpl:scanStartElement line 356 >>> >>> (a big +1 to open source here) >>> >>> 1/ The first problem line hit is <variable name="class"/> >>> >>> "/>" is the trigger. >>> >>> <variable name="class"></variable> would work. >>> >>> 2/ It affects Xerces 2.11.0 and also the Xerces inside OpenJDK. >>> https://bugs.openjdk.java.net/browse/JDK-8029437 >>> >>> 3/ Adding org.codehaus.woodstox:wstx-asl to the dependencies can fix it >>> (may depend on ordering) - e.g. add jena-text to your project (!!!). >>> because it picks up a different STaX parser. >>> >>> Andy >>> >>> >>> >>> On 13/05/15 14:06, Rob Vesse wrote: >>>> >>>> Jeremy >>>> >>>> Looks like someone else just ran into the same issue and filed a bug - >>>> JENA-940 [1] - feel free to add a comment there indicating that this >>>> appears to be the same issue you encounter >>>> >>>> Apparently the issue has something to do with DBPedia adopting XML 1.1 >>>> and >>>> a lack of support for that in Xerces (or at least the version of Xerces >>>> Jena currently uses) >>>> >>>> Rob >>>> >>>> [1] https://issues.apache.org/jira/browse/JENA-940 >>>> >>>> On 13/05/2015 12:27, "Jeremy Debattista" <[email protected]> >>>> wrote: >>>> >>>>> Hi Rob, >>>>> >>>>> Yes that is what I suspect as well, even though when I use a curl >>>>> function with content negotiation [1], the returned results look good >>>>> (and well formed). Anyway, this is the complete error stack: >>>>> >>>>> com.hp.hpl.jena.sparql.resultset.ResultSetException: Failed when >>>>> initializing the StAX parsing engine >>>>> at >>>>> >>>>> >>>>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX.<init>(XMLInputStAX.java:1 >>>>> 19 >>>>> ) >>>>> at >>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.make(XMLInput.java:73) >>>>> at >>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:42) >>>>> at >>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:37) >>>>> at >>>>> >>>>> >>>>> com.hp.hpl.jena.query.ResultSetFactory.fromXML(ResultSetFactory.java:312 >>>>> ) >>>>> at >>>>> >>>>> >>>>> com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngin >>>>> eH >>>>> TTP.java:372) >>>>> at >>>>> >>>>> >>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.executeQuery(SPARQLHandler >>>>> .j >>>>> ava:41) >>>>> at >>>>> >>>>> >>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.getLabelFromNode(SPARQLHan >>>>> dl >>>>> er.java:80) >>>>> at >>>>> >>>>> >>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.<init>(RDFClass.j >>>>> av >>>>> a:62) >>>>> at >>>>> >>>>> >>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RD >>>>> FC >>>>> lass.java:228) >>>>> at >>>>> >>>>> >>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RD >>>>> FC >>>>> lass.java:222) >>>>> at >>>>> com.servlet.routes.BuilderRoute.getProperties(BuilderRoute.java:172) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> >>>>> >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav >>>>> a: >>>>> 57) >>>>> at >>>>> >>>>> >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor >>>>> Im >>>>> pl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> >>>>> Cheers, >>>>> Jeremy >>>>> >>>>> >>>>> [1] curl -H "Accept: application/sparql-results+xml" -g >>>>> >>>>> >>>>> "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&qu >>>>> er >>>>> >>>>> >>>>> y=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns >>>>> %2 >>>>> >>>>> >>>>> 3%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%2 >>>>> 3% >>>>> >>>>> >>>>> 3E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELE >>>>> CT >>>>> >>>>> >>>>> +distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClas >>>>> s% >>>>> >>>>> >>>>> 7D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+% >>>>> 3F >>>>> >>>>> >>>>> label.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C >>>>> %5 >>>>> Cbact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A” >>>>> >>>>> On 13 May 2015, at 12:32, Rob Vesse <[email protected]> wrote: >>>>> >>>>>> What is the error message you get? >>>>>> >>>>>> It is not unheard of for Virtuoso (the software that powers DBPedia) >>>>>> to >>>>>> produce bad output particularly if the data has not been appropriately >>>>>> sanitised so I would suspect Virtuoso before suspecting Jena in a case >>>>>> like this >>>>>> >>>>>> Rob >>>>>> >>>>>> On 13/05/2015 10:16, "Jeremy Debattista" <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Dear All, >>>>>>> >>>>>>> I am trying to query the DBpedia SPARQL endpoint using the >>>>>>> QueryExecutionFactory sparqlService and execSelect(), but I’m given >>>>>>> the >>>>>>> following error: com.hp.hpl.jena.sparql.resultset.ResultSetException: >>>>>>> Failed when initializing the StAX parsing engine >>>>>>> >>>>>>> The query in question is >>>>>>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX >>>>>>> rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX >>>>>>> owl:<http://www.w3.org/2002/07/owl#> SELECT distinct ?class ?label >>>>>>> WHERE { {?class rdf:type owl:Class} UNION {?class rdf:type >>>>>>> rdfs:Class}. >>>>>>> ?class rdfs:label ?label. FILTER(bound(?label) && REGEX(?label, >>>>>>> "\\bact","i"))} ORDER BY ?class >>>>>>> >>>>>>> which gives a result in dbpedia sparql web interface [1]. >>>>>>> >>>>>>> The code in question is the following: >>>>>>> >>>>>>> public static ResultSet executeQuery(String uri, String queryString) >>>>>>> { >>>>>>> Query query = QueryFactory.create(queryString); >>>>>>> QueryExecution qexec = >>>>>>> QueryExecutionFactory.sparqlService(uri, >>>>>>> query); >>>>>>> try { >>>>>>> ResultSet results = qexec.execSelect(); >>>>>>> return results; >>>>>>> } finally { >>>>>>> >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> After debugging, the problem seems to be related to how the XML >>>>>>> parser >>>>>>> is >>>>>>> reading the stream input. Would you have any other idea how I can go >>>>>>> around it? >>>>>>> >>>>>>> Best Regards, >>>>>>> Jeremy >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&q >>>>>>> ue >>>>>>> ry >>>>>>> >>>>>>> >>>>>>> >>>>>>> =PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-n >>>>>>> s% >>>>>>> 23 >>>>>>> >>>>>>> >>>>>>> >>>>>>> %3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema% >>>>>>> 23 >>>>>>> %3 >>>>>>> >>>>>>> >>>>>>> >>>>>>> E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SEL >>>>>>> EC >>>>>>> T+ >>>>>>> >>>>>>> >>>>>>> >>>>>>> distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3ACla >>>>>>> ss >>>>>>> %7 >>>>>>> >>>>>>> >>>>>>> >>>>>>> D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+ >>>>>>> %3 >>>>>>> Fl >>>>>>> >>>>>>> >>>>>>> >>>>>>> abel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5 >>>>>>> C% >>>>>>> 5C >>>>>>> >>>>>>> >>>>>>> >>>>>>> bact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A&format=text%2Fhtml& >>>>>>> ti >>>>>>> me >>>>>>> out=30000&debug=on >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>> >> >> >> >> >
