Hi, I tried with cURL + riot CLI tools manually and can't reproduce the parsing issue, neither with RDF/XML nor with Turtle.
curl -L -H "Accept: text/turtle" http://dbpedia.org/resource/User_guide > /tmp/test.ttl curl -L -H "Accept: application/rdf+xml" http://dbpedia.org/resource/User_guide > /tmp/test.rdf I know, that a few years ago DBpedia (resp. its Virtuoso backend) had some issues with serialization, but this has been fixed long time ago. Also, I don't understand what you mean by "suspicious"? The parser can easily convert the UTF-8 encoded URIs as expected: riot --check /tmp/test.ttl <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://nl.dbpedia.org/resource/Handleiding> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://cs.dbpedia.org/resource/Uživatelská_příručka> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://wikidata.dbpedia.org/resource/Q1057179> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://www.wikidata.org/entity/Q1057179> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://ko.dbpedia.org/resource/사용_설명서> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://es.dbpedia.org/resource/Guía_del_usuario> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://ja.dbpedia.org/resource/マニュアル> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://it.dbpedia.org/resource/Manuale> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://rdf.freebase.com/ns/m.04mqbf> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://fr.dbpedia.org/resource/Mode_d'emploi> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://yago-knowledge.org/resource/User_guide> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://de.dbpedia.org/resource/Gebrauchsanleitung> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://id.dbpedia.org/resource/Manual_pengguna> . <http://dbpedia.org/resource/User_guide> <http://www.w3.org/2002/07/owl#sameAs> <http://dbpedia.org/resource/User_guide> . On 24.04.20 22:33, Jean-Marc Vanel wrote: > Le ven. 24 avr. 2020 à 22:17, Andy Seaborne <[email protected]> a écrit : > >> On 24/04/2020 15:17, Jean-Marc Vanel wrote: >>> How to reproduce with 3.14.0 >>> >>> bin/*tdbloader* --loc TDB --graph=http://dbpedia.org/resource/User_guide >> \ >>> --verbose http://dbpedia.org/resource/User_guide >> Did the log say anything? >> > NO, nothing special, neither with --debug . > > As this is a remote URL, did it all arrive and parse without warnings? > No warning. > > Was the database fresh or was there data in it to start with? > database fresh, of course. > > >>> echo " >>> CONSTRUCT { >>> <http://dbpedia.org/resource/User_guide> >>> ?P ?O . } >>> WHERE { GRAPH ?G { >>> <http://dbpedia.org/resource/User_guide> >>> ?P ?O . } } >>> LIMIT >>> # 30 # OK >>> 35 # KO !!! >>> " > /tmp/const.ql >>> >>> bin/*tdbquery* --debug --loc=TDB --query /tmp/const.ql >>> >>> And here is the *stack*: >>> >>> 16:14:23 ERROR BindingTDB :: get1(?O) >>> java.lang.StringIndexOutOfBoundsException: String index out of range: 39 >>> at java.lang.String.charAt(String.java:658) >>> at org.apache.jena.atlas.lib.StrUtils.decodeHex(StrUtils.java:212) >>> at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:121) >> If the load was clean, the database is intact and it is a decoding bug >> in Jena for an URI. The data has a lot of encoded \u terms but its a URI >> in the object position causing a problem. (I don't see why these are >> encoded - it's not necessary). >> > Indeed these URI are suspect: > > <http://fr.dbpedia.org/resource/Mode_d\u0027emploi> , > <http://es.dbpedia.org/resource/Gu\u00EDa_del_usuario> . > > <http://ja.dbpedia.org/resource/\u30DE\u30CB\u30E5\u30A2\u30EB> , > < > http://cs.dbpedia.org/resource/U\u017Eivatelsk\u00E1_p\u0159\u00EDru\u010Dka> > , > <http://ko.dbpedia.org/resource/\uC0AC\uC6A9_\uC124\uBA85\uC11C> . > > >> Andy >> >> ... >>> at tdb.tdbquery.main(tdbquery.java:33) >>> >>> NOTE : no problem with apache-jena-3.10.0-SNAPSHOT !? >>> >>> >>> Jean-Marc Vanel >>> < >> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me >>> +33 (0)6 89 16 29 52 >>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui >>> Chroniques jardin >>> < >> http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle >>>
