Andy, I took a crack at it: https://github.com/Graphity/graphity-core/blob/master/src/main/java/org/graphity/core/riot/lang/RDFPostReader.java https://github.com/Graphity/graphity-core/blob/master/src/main/java/org/graphity/core/riot/lang/TokenizerText.java
It was surely one of the more labor-intensive pieces of code in a while... Works with the example from RDF/POST spec, but I need to do more testing. Probably could be more DRY as well. If you have some advice, please let me know. Martynas graphityhq.com On Mon, May 11, 2015 at 2:44 PM, Andy Seaborne <a...@apache.org> wrote: > On 10/05/15 21:48, Martynas Jusevičius wrote: >> >> Hey all, >> >> I want to refactor my RDF/POST parser into a Jena-compatible reader. >> An example of the format can be found here: >> http://www.lsrn.org/semweb/rdfpost.html#sec-examples >> >> The documentation suggests implementing ReaderRIOT interface: >> >> https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java >> >> However, if I look at (what I think is) existing readers such as >> Turtle for example, they do not seem to implement ReaderRIOT: >> >> https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangTurtleBase.java >> >> What is the explanation for that? > > > Hi Martynas, > > It is historical - the Turtle derived parsers emerged with the RiotReader > interface and some code is/was around that used that interface. > > ReaderRIOTLang is the cross-over code from the proper interface ReaderRIOT > to RiotReader. RiotReader is a fixed set of parsers. > > This can be sorted out in Jena3. > >> >> Do I need to to tokenize the InputStream myself or is there some >> machinery I can reuse? > > > The Turtle-world tokenizer is TokenizerText. It is turtle term specific. > > Any tokenizing for a new language is often, in my experience, very sensitive > to the language details. > > If you are used to javacc, and performance isn't critical at scale, that's a > good tool. > > RIOT uses custom I/O for speed; Jena used to have a javacc parser for Turtle > but Turtle is sufficiently simple that a hand-written parser is doable. A > hand written tokenizer is for speed at scale (big file - about x2 than basic > javacc tokenizing) but you need large input to make it worthwhile. NTriples > dumps of databases make it worthwhile. > > If you do rdfpost -> Turtle (string manipulation), then you can parse the > Turtle as normal. Downside: Error messages may be confusing as they refer > to the Turtle, not the input string. > > Splitting up the query string, with all the HTTP escaping rules, can be done > with library code (see FusekiLib.parseQueryString [no longer used, but it > works without consuming the body, unlike the servlet operations which > combine form and query string processing] and probably lots of better code > examples on the web. > > Andy >> >> >> Martynas >> graphityhq.com >> >