On 10/05/15 21:48, Martynas Jusevičius wrote:
Hey all,
I want to refactor my RDF/POST parser into a Jena-compatible reader.
An example of the format can be found here:
http://www.lsrn.org/semweb/rdfpost.html#sec-examples
The documentation suggests implementing ReaderRIOT interface:
https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java
However, if I look at (what I think is) existing readers such as
Turtle for example, they do not seem to implement ReaderRIOT:
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangTurtleBase.java
What is the explanation for that?
Hi Martynas,
It is historical - the Turtle derived parsers emerged with the
RiotReader interface and some code is/was around that used that interface.
ReaderRIOTLang is the cross-over code from the proper interface
ReaderRIOT to RiotReader. RiotReader is a fixed set of parsers.
This can be sorted out in Jena3.
Do I need to to tokenize the InputStream myself or is there some
machinery I can reuse?
The Turtle-world tokenizer is TokenizerText. It is turtle term specific.
Any tokenizing for a new language is often, in my experience, very
sensitive to the language details.
If you are used to javacc, and performance isn't critical at scale,
that's a good tool.
RIOT uses custom I/O for speed; Jena used to have a javacc parser for
Turtle but Turtle is sufficiently simple that a hand-written parser is
doable. A hand written tokenizer is for speed at scale (big file -
about x2 than basic javacc tokenizing) but you need large input to make
it worthwhile. NTriples dumps of databases make it worthwhile.
If you do rdfpost -> Turtle (string manipulation), then you can parse
the Turtle as normal. Downside: Error messages may be confusing as they
refer to the Turtle, not the input string.
Splitting up the query string, with all the HTTP escaping rules, can be
done with library code (see FusekiLib.parseQueryString [no longer used,
but it works without consuming the body, unlike the servlet operations
which combine form and query string processing] and probably lots of
better code examples on the web.
Andy
Martynas
graphityhq.com