On 19/01/17 11:47, Jean-Marc Vanel wrote:
FIX typo

2017-01-19 12:45 GMT+01:00 Jean-Marc Vanel <[email protected]>:

There is, however, a possible improvement in Jena parser.
It could skip over the faulty triple, and output a non empty graph with
the correct ones.

Since it should also report the faulty input, I guess this would be in
another new method.

I guess also that writing a fault tolerant parser is not easy ...

Not for Turtle specially when the basic token is broken.(prefix name - they don't have simple delimiters like <>).

The code (TokenizerText.readSegment) may be able to deal with some cases better.

N-triples would be much easier because of scanning to end of line and the tokens of the language have delimiters is recovery.

What to avoid is having to read ahead then reread to parse when in normal, non-error mode. That will slow down parsing measurably - tokenizing is on the critical time path for throughput. For example, using JavaCC, which has a more pwerful (expressive) tokenizer, came out significantly slower (near 50% for N-triples IIRC).

        Andy



2017-01-19 11:23 GMT+01:00 Jean-Marc Vanel <[email protected]>:

Vielen Dank Lorenz

Thanks for the accurate diagnosis and the bug report to Virtuoso.

I was aware of the issue;
I was testing from my semantic_forms application,
were the exception catching has gone wrong recently (and Scala language
does not require to catch exceptions).



2017-01-19 10:05 GMT+01:00 Lorenz B. <[email protected]>
:

Hi,

can you clarify what doesn't work?

I tried your example and it would work, but I'm getting a parse
exception because DBpedia (resp. Virtuoso) still returns illegal data:

   Graph g = RDFDataMgr.loadGraph("http://dbpedia.org/resource/Rome";);
   System.out.println(g.size());


[line: 1863, col: 13] Failed to find a prefix name or keyword:
–(8211;0x2013)
Exception in thread "main" org.apache.jena.riot.RiotException: [line:
1863, col: 13] Failed to find a prefix name or keyword: –(8211;0x2013)
    at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandler
Std.fatal(ErrorHandlerFactory.java:136)
    at
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:165)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.ja
va:108)
    at
org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleB
ase.java:248)
    at
org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(
LangTurtleBase.java:190)
    at
org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(Lang
Turtle.java:46)
    at
org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtl
eBase.java:89)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(R
DFParserRegistry.java:179)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:861)
    at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:667)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:212)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:105)
    at org.apache.jena.riot.RDFDataMgr.loadGraph(RDFDataMgr.java:346)




If there is no parsing error (e.g.http://dbpedia.org/resource/Mars works
for me)  the result is as expected:


   Graph g = RDFDataMgr.loadGraph("http://dbpedia.org/resource/Mars";);
   System.out.println(g.size());

   Output: 347


The issue was already reported by me, see [1], and [2]

[1] https://github.com/openlink/virtuoso-opensource/issues/567
[2] https://github.com/openlink/virtuoso-opensource/issues/569



Kind regards,
Lorenz

Reading a dbPedia resource with e.g. RDFDataMgr.loadGraph() does not
currently work (it used to work with Jena 3.1.1 ).
Apparently this is because of the 303 redirection.
Is there another call in Jena API to handle redirections and accepting
RDF
MIME types ?

wget --save-headers --header='Accept: application/rdf+xml'
http://dbpedia.org/resource/Rome
--2017-01-19 09:30:08--  http://dbpedia.org/resource/Rome
Résolution de dbpedia.org (dbpedia.org)… 194.109.129.58
Connexion à dbpedia.org (dbpedia.org)|194.109.129.58|:80… connecté.
requête HTTP transmise, en attente de la réponse… *303 See Other*
Emplacement : http://dbpedia.org/data/Rome.xml [suivant]
--2017-01-19 09:30:08--  http://dbpedia.org/data/Rome.xml
Réutilisation de la connexion existante à dbpedia.org:80.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 1003627 (980K) [application/rdf+xml]
Enregistre : «Rome.2»

less Rome.2
HTTP/1.1 200 OK
Date: Thu, 19 Jan 2017 08:30:08 GMT
Content-Type: application/rdf+xml; charset=UTF-8
Content-Length: 1003627
Connection: keep-alive
Vary: Accept-Encoding
Server: Virtuoso/07.20.3217 (Linux) i686-generic-linux-glibc212-64
VDB
Expires: Thu, 26 Jan 2017 08:30:08 GMT
Link: <http://creativecommons.org/licenses/by-sa/3.0/>;rel="license",<
http://dbpedia.org/data/Rome.n3>; rel="alternate"; type="text/n3";
title="Structured Descriptor Document (N3/Turtle format)", <
http://dbpedia.org/data/Rome.json>; rel="alternate";
type="application/json"; title="Structured Descriptor Document
(RDF/JSON
format)", <http://dbpedia.org/data/Rome.atom>; rel="alternate";
type="application/atom+xml"; title="OData (Atom+Feed format)", <
http://dbpedia.org/data/Rome.jsod>; rel="alternate";
type="application/odata+json"; title="OData (JSON format)", <
http://dbpedia.org/page/Rome>; rel="alternate"; type="text/html";
title="XHTML+RDFa", <http://dbpedia.org/resource/Rome>; rel="
http://xmlns.com/foaf/0.1/primaryTopic";, <
http://dbpedia.org/resource/Rome>;
rev="describedby", <
http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbped
ia.org/data/Rome.xml>;
rel="timegate"
X-SPARQL-default-graph: http://dbpedia.org
Cache-Control: max-age=604800
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers:
DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If
-Modified-Since,Cache-Control,Content-Type,Accept-Encoding
Accept-Ranges: bytes

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
...

--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center




--
Jean-Marc Vanel
Profil: http://163.172.179.125:9111/display?displayuri=http%3A%2F%2F
jmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui




--
Jean-Marc Vanel
Profil: http://163.172.179.125:9111/display?displayuri=http%3A%2F%
2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52 <+33%206%2089%2016%2029%2052>
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui




Reply via email to