Should we still ask DBPedia to switch back to XML 1.0 ?

On Wed, May 13, 2015 at 9:02 PM, Andy Seaborne <[email protected]> wrote:
> On 13/05/15 15:27, Rob Vesse wrote:
>>
>> I assume you'll go ahead and file a bug against Xerces?
>
>
> The issue does not seem to be in Apache Xerces.
>
> Jena is picking up the JDK XMLStreamReader implementation.
>
> Xerces does not provide javax.xml.stream.XMLInputFactory and XMLStreamReader
> at least its not in META-INF/services
>
> It means that adding org.codehaus.woodstox:wstx-asl is a valid workaround
> always as the default JDK provider is not used unless there are no
> XMLInputFactory registered (ServiceLoader).
>
> Its surprising that the JDK bug is still open as the fix for the JDK looks
> small.
>
>         Andy
>
>
>>
>> Rob
>>
>> On 13/05/2015 14:56, "Andy Seaborne" <[email protected]> wrote:
>>
>>> So far we know:
>>>
>>> It is a bug in Xerces handling of 1.1
>>>
>>> Specifically, an NPE
>>>     XML11NSDocumentScannerImpl:scanStartElement line 356
>>>
>>> (a big +1 to open source here)
>>>
>>> 1/ The first problem line hit is <variable name="class"/>
>>>
>>> "/>" is the trigger.
>>>
>>> <variable name="class"></variable> would work.
>>>
>>> 2/ It affects Xerces 2.11.0 and also the Xerces inside OpenJDK.
>>> https://bugs.openjdk.java.net/browse/JDK-8029437
>>>
>>> 3/ Adding org.codehaus.woodstox:wstx-asl to the dependencies can fix it
>>> (may depend on ordering) - e.g. add jena-text to your project (!!!).
>>> because it picks up a different STaX parser.
>>>
>>>         Andy
>>>
>>>
>>>
>>> On 13/05/15 14:06, Rob Vesse wrote:
>>>>
>>>> Jeremy
>>>>
>>>> Looks like someone else just ran into the same issue and filed a bug -
>>>> JENA-940 [1] - feel free to add a comment there indicating that this
>>>> appears to be the same issue you encounter
>>>>
>>>> Apparently the issue has something to do with DBPedia adopting XML 1.1
>>>> and
>>>> a lack of support for that in Xerces (or at least the version of Xerces
>>>> Jena currently uses)
>>>>
>>>> Rob
>>>>
>>>> [1] https://issues.apache.org/jira/browse/JENA-940
>>>>
>>>> On 13/05/2015 12:27, "Jeremy Debattista" <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Rob,
>>>>>
>>>>> Yes that is what I suspect as well, even though when I use a curl
>>>>> function with content negotiation [1], the returned results look good
>>>>> (and well formed). Anyway, this is the complete error stack:
>>>>>
>>>>> com.hp.hpl.jena.sparql.resultset.ResultSetException: Failed when
>>>>> initializing the StAX parsing engine
>>>>>         at
>>>>>
>>>>>
>>>>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX.<init>(XMLInputStAX.java:1
>>>>> 19
>>>>> )
>>>>>         at
>>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.make(XMLInput.java:73)
>>>>>         at
>>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:42)
>>>>>         at
>>>>> com.hp.hpl.jena.sparql.resultset.XMLInput.fromXML(XMLInput.java:37)
>>>>>         at
>>>>>
>>>>>
>>>>> com.hp.hpl.jena.query.ResultSetFactory.fromXML(ResultSetFactory.java:312
>>>>> )
>>>>>         at
>>>>>
>>>>>
>>>>> com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngin
>>>>> eH
>>>>> TTP.java:372)
>>>>>         at
>>>>>
>>>>>
>>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.executeQuery(SPARQLHandler
>>>>> .j
>>>>> ava:41)
>>>>>         at
>>>>>
>>>>>
>>>>> de.unibonn.iai.eis.linda.helper.SPARQLHandler.getLabelFromNode(SPARQLHan
>>>>> dl
>>>>> er.java:80)
>>>>>         at
>>>>>
>>>>>
>>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.<init>(RDFClass.j
>>>>> av
>>>>> a:62)
>>>>>         at
>>>>>
>>>>>
>>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RD
>>>>> FC
>>>>> lass.java:228)
>>>>>         at
>>>>>
>>>>>
>>>>> de.unibonn.iai.eis.linda.querybuilder.classes.RDFClass.searchRDFClass(RD
>>>>> FC
>>>>> lass.java:222)
>>>>>         at
>>>>> com.servlet.routes.BuilderRoute.getProperties(BuilderRoute.java:172)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at
>>>>>
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
>>>>> a:
>>>>> 57)
>>>>>         at
>>>>>
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
>>>>> Im
>>>>> pl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>
>>>>> Cheers,
>>>>> Jeremy
>>>>>
>>>>>
>>>>> [1] curl -H "Accept: application/sparql-results+xml" -g
>>>>>
>>>>>
>>>>> "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&qu
>>>>> er
>>>>>
>>>>>
>>>>> y=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns
>>>>> %2
>>>>>
>>>>>
>>>>> 3%3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%2
>>>>> 3%
>>>>>
>>>>>
>>>>> 3E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SELE
>>>>> CT
>>>>>
>>>>>
>>>>> +distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3AClas
>>>>> s%
>>>>>
>>>>>
>>>>> 7D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+%
>>>>> 3F
>>>>>
>>>>>
>>>>> label.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5C
>>>>> %5
>>>>> Cbact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A”
>>>>>
>>>>> On 13 May 2015, at 12:32, Rob Vesse <[email protected]> wrote:
>>>>>
>>>>>> What is the error message you get?
>>>>>>
>>>>>> It is not unheard of for Virtuoso (the software that powers DBPedia)
>>>>>> to
>>>>>> produce bad output particularly if the data has not been appropriately
>>>>>> sanitised so I would suspect Virtuoso before suspecting Jena in a case
>>>>>> like this
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> On 13/05/2015 10:16, "Jeremy Debattista" <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear All,
>>>>>>>
>>>>>>> I am trying to query the DBpedia SPARQL endpoint using the
>>>>>>> QueryExecutionFactory sparqlService and execSelect(), but I’m given
>>>>>>> the
>>>>>>> following error: com.hp.hpl.jena.sparql.resultset.ResultSetException:
>>>>>>> Failed when initializing the StAX parsing engine
>>>>>>>
>>>>>>> The query in question is
>>>>>>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX
>>>>>>> rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX
>>>>>>> owl:<http://www.w3.org/2002/07/owl#>  SELECT distinct ?class ?label
>>>>>>> WHERE { {?class rdf:type owl:Class} UNION {?class rdf:type
>>>>>>> rdfs:Class}.
>>>>>>> ?class rdfs:label ?label.   FILTER(bound(?label)  && REGEX(?label,
>>>>>>> "\\bact","i"))} ORDER BY ?class
>>>>>>>
>>>>>>> which gives a result in dbpedia sparql web interface [1].
>>>>>>>
>>>>>>> The code in question is the following:
>>>>>>>
>>>>>>> public static ResultSet executeQuery(String uri, String queryString)
>>>>>>> {
>>>>>>>         Query query = QueryFactory.create(queryString);
>>>>>>>         QueryExecution qexec =
>>>>>>> QueryExecutionFactory.sparqlService(uri,
>>>>>>> query);
>>>>>>>         try {
>>>>>>>                 ResultSet results = qexec.execSelect();
>>>>>>>                 return results;
>>>>>>>         } finally {
>>>>>>>
>>>>>>>         }
>>>>>>> }
>>>>>>>
>>>>>>> After debugging, the problem seems to be related to how the XML
>>>>>>> parser
>>>>>>> is
>>>>>>> reading the stream input. Would you have any other idea how I can go
>>>>>>> around it?
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Jeremy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&q
>>>>>>> ue
>>>>>>> ry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> =PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-n
>>>>>>> s%
>>>>>>> 23
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> %3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%
>>>>>>> 23
>>>>>>> %3
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E++SEL
>>>>>>> EC
>>>>>>> T+
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3ACla
>>>>>>> ss
>>>>>>> %7
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+
>>>>>>> %3
>>>>>>> Fl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> abel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5
>>>>>>> C%
>>>>>>> 5C
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> bact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A&format=text%2Fhtml&
>>>>>>> ti
>>>>>>> me
>>>>>>> out=30000&debug=on
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

Reply via email to