Hi,

I tried with cURL + riot CLI tools manually and can't reproduce the
parsing issue, neither with RDF/XML nor with Turtle.

curl -L -H "Accept: text/turtle" http://dbpedia.org/resource/User_guide
> /tmp/test.ttl
curl -L -H "Accept: application/rdf+xml"
http://dbpedia.org/resource/User_guide > /tmp/test.rdf


I know, that a few years ago DBpedia (resp. its Virtuoso backend) had
some issues with serialization, but this has been fixed long time ago.

Also, I don't understand what you mean by "suspicious"? The parser can
easily convert the UTF-8 encoded URIs as expected:

riot --check /tmp/test.ttl

<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://nl.dbpedia.org/resource/Handleiding> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://cs.dbpedia.org/resource/Uživatelská_příručka> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://wikidata.dbpedia.org/resource/Q1057179> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://www.wikidata.org/entity/Q1057179> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://ko.dbpedia.org/resource/사용_설명서> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://es.dbpedia.org/resource/Guía_del_usuario> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://ja.dbpedia.org/resource/マニュアル> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://it.dbpedia.org/resource/Manuale> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://rdf.freebase.com/ns/m.04mqbf> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://fr.dbpedia.org/resource/Mode_d'emploi> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://yago-knowledge.org/resource/User_guide> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://de.dbpedia.org/resource/Gebrauchsanleitung> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://id.dbpedia.org/resource/Manual_pengguna> .
<http://dbpedia.org/resource/User_guide>
<http://www.w3.org/2002/07/owl#sameAs>
<http://dbpedia.org/resource/User_guide> .

On 24.04.20 22:33, Jean-Marc Vanel wrote:
> Le ven. 24 avr. 2020 à 22:17, Andy Seaborne <[email protected]> a écrit :
>
>> On 24/04/2020 15:17, Jean-Marc Vanel wrote:
>>> How to reproduce with 3.14.0
>>>
>>> bin/*tdbloader* --loc TDB --graph=http://dbpedia.org/resource/User_guide
>> \
>>>    --verbose http://dbpedia.org/resource/User_guide
>> Did the log say anything?
>>
> NO, nothing special, neither with --debug .
>
> As this is a remote URL, did it all arrive and parse without warnings?
> No warning.
>
> Was the database fresh or was there data in it to start with?
> database fresh, of course.
>
>
>>> echo "
>>> CONSTRUCT {
>>>   <http://dbpedia.org/resource/User_guide>
>>>    ?P ?O . }
>>> WHERE { GRAPH ?G {
>>>   <http://dbpedia.org/resource/User_guide>
>>>    ?P ?O . } }
>>> LIMIT
>>> # 30 # OK
>>> 35 # KO !!!
>>> " > /tmp/const.ql
>>>
>>> bin/*tdbquery* --debug --loc=TDB --query /tmp/const.ql
>>>
>>> And here is the *stack*:
>>>
>>> 16:14:23 ERROR BindingTDB           :: get1(?O)
>>> java.lang.StringIndexOutOfBoundsException: String index out of range: 39
>>> at java.lang.String.charAt(String.java:658)
>>> at org.apache.jena.atlas.lib.StrUtils.decodeHex(StrUtils.java:212)
>>> at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:121)
>> If the load was clean, the database is intact and it is a decoding bug
>> in Jena for an URI. The data has a lot of encoded \u terms but its a URI
>> in the object position causing a problem.  (I don't see why these are
>> encoded - it's not necessary).
>>
> Indeed these URI are suspect:
>
> <http://fr.dbpedia.org/resource/Mode_d\u0027emploi> ,
> <http://es.dbpedia.org/resource/Gu\u00EDa_del_usuario> .
>
> <http://ja.dbpedia.org/resource/\u30DE\u30CB\u30E5\u30A2\u30EB> ,
> <
> http://cs.dbpedia.org/resource/U\u017Eivatelsk\u00E1_p\u0159\u00EDru\u010Dka>
> ,
> <http://ko.dbpedia.org/resource/\uC0AC\uC6A9_\uC124\uBA85\uC11C> .
>
>
>>      Andy
>>
>> ...
>>> at tdb.tdbquery.main(tdbquery.java:33)
>>>
>>> NOTE : no problem with apache-jena-3.10.0-SNAPSHOT !?
>>>
>>>
>>> Jean-Marc Vanel
>>> <
>> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
>>> +33 (0)6 89 16 29 52
>>> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>>>   Chroniques jardin
>>> <
>> http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
>>>

Reply via email to