As was stated by Andy, this is not a parsing issue. riot is not reporting anything, nor rapper <http://librdf.org/raptor/rapper.html> . This is an issue with how TDB renders the URI once it has been stored in TDB.
Jean-Marc Vanel <http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me> +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui Chroniques jardin <http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle> Le sam. 25 avr. 2020 à 09:34, Lorenz Buehmann < [email protected]> a écrit : > Hi, > > I tried with cURL + riot CLI tools manually and can't reproduce the > parsing issue, neither with RDF/XML nor with Turtle. > > curl -L -H "Accept: text/turtle" http://dbpedia.org/resource/User_guide > > /tmp/test.ttl > curl -L -H "Accept: application/rdf+xml" > http://dbpedia.org/resource/User_guide > /tmp/test.rdf > > > I know, that a few years ago DBpedia (resp. its Virtuoso backend) had > some issues with serialization, but this has been fixed long time ago. > > Also, I don't understand what you mean by "suspicious"? The parser can > easily convert the UTF-8 encoded URIs as expected: > > riot --check /tmp/test.ttl > > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://nl.dbpedia.org/resource/Handleiding> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://cs.dbpedia.org/resource/Uživatelská_příručka> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://wikidata.dbpedia.org/resource/Q1057179> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://www.wikidata.org/entity/Q1057179> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://ko.dbpedia.org/resource/사용_설명서> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://es.dbpedia.org/resource/Guía_del_usuario> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://ja.dbpedia.org/resource/マニュアル> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://it.dbpedia.org/resource/Manuale> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://rdf.freebase.com/ns/m.04mqbf> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://fr.dbpedia.org/resource/Mode_d'emploi> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://yago-knowledge.org/resource/User_guide> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://de.dbpedia.org/resource/Gebrauchsanleitung> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://id.dbpedia.org/resource/Manual_pengguna> . > <http://dbpedia.org/resource/User_guide> > <http://www.w3.org/2002/07/owl#sameAs> > <http://dbpedia.org/resource/User_guide> . > > On 24.04.20 22:33, Jean-Marc Vanel wrote: > > Le ven. 24 avr. 2020 à 22:17, Andy Seaborne <[email protected]> a écrit : > > > >> On 24/04/2020 15:17, Jean-Marc Vanel wrote: > >>> How to reproduce with 3.14.0 > >>> > >>> bin/*tdbloader* --loc TDB --graph= > http://dbpedia.org/resource/User_guide > >> \ > >>> --verbose http://dbpedia.org/resource/User_guide > >> Did the log say anything? > >> > > NO, nothing special, neither with --debug . > > > > As this is a remote URL, did it all arrive and parse without warnings? > > No warning. > > > > Was the database fresh or was there data in it to start with? > > database fresh, of course. > > > > > >>> echo " > >>> CONSTRUCT { > >>> <http://dbpedia.org/resource/User_guide> > >>> ?P ?O . } > >>> WHERE { GRAPH ?G { > >>> <http://dbpedia.org/resource/User_guide> > >>> ?P ?O . } } > >>> LIMIT > >>> # 30 # OK > >>> 35 # KO !!! > >>> " > /tmp/const.ql > >>> > >>> bin/*tdbquery* --debug --loc=TDB --query /tmp/const.ql > >>> > >>> And here is the *stack*: > >>> > >>> 16:14:23 ERROR BindingTDB :: get1(?O) > >>> java.lang.StringIndexOutOfBoundsException: String index out of range: > 39 > >>> at java.lang.String.charAt(String.java:658) > >>> at org.apache.jena.atlas.lib.StrUtils.decodeHex(StrUtils.java:212) > >>> at > org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:121) > >> If the load was clean, the database is intact and it is a decoding bug > >> in Jena for an URI. The data has a lot of encoded \u terms but its a URI > >> in the object position causing a problem. (I don't see why these are > >> encoded - it's not necessary). > >> > > Indeed these URI are suspect: > > > > <http://fr.dbpedia.org/resource/Mode_d\u0027emploi> , > > <http://es.dbpedia.org/resource/Gu\u00EDa_del_usuario> . > > > > <http://ja.dbpedia.org/resource/\u30DE\u30CB\u30E5\u30A2\u30EB> , > > < > > > http://cs.dbpedia.org/resource/U\u017Eivatelsk\u00E1_p\u0159\u00EDru\u010Dka > > > > , > > <http://ko.dbpedia.org/resource/\uC0AC\uC6A9_\uC124\uBA85\uC11C> . > > > > > >> Andy > >> > >> ... > >>> at tdb.tdbquery.main(tdbquery.java:33) > >>> > >>> NOTE : no problem with apache-jena-3.10.0-SNAPSHOT !? > >>> > >>> > >>> Jean-Marc Vanel > >>> < > >> > http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me > >>> +33 (0)6 89 16 29 52 > >>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc:// > irc.freenode.net#eulergui > >>> Chroniques jardin > >>> < > >> > http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle > >>> > >
