Hi,

        I have been working on getting a non blocking parsers to work. The 
point of that is that 
when you fetch RDF from the web you want to use as few resources as possible. 
If possible one should 
only use a few k of memory even for files that are 1GB long. Async parsing 
allows one to have 1000s
of open connections simultaneously one only a few threads, also saving on 
thread costs (0.5-1MB per
thread) For more on what asycn parsing allows one to do see the Jena bug report 
[1] 

        I got an async rdf/xml parser going last week using Jena, and wrote a 
full NTriples one too. 
This one using a powerful scala library called nomo . Then this week Alex 
Bertails published a 
Scala library that should allow us to write code to both Jena and Sesame in 
Scala with very little 
overhead. It's called "pimp-my-rdf" [2]

So here are some pointers:

  - the RDF/XML parser is using the Jena parser but adapted to non blocking.
     
https://dvcs.w3.org/hg/read-write-web/file/d9c1f87eee55/src/main/scala/cache/WebFetcher.scala
  - The NTriples Parser written from scratch is here
     
https://github.com/betehess/pimp-my-rdf/blob/master/n-triples-parser/src/main/scala/Parser.scala


It should not be that difficult to write a Turtle parser next. So hopefully I 
should have that
working soon too.


Henry

[1] More on the Jena bug report
   https://issues.apache.org/jira/browse/JENA-203
[2] https://github.com/betehess/pimp-my-rdf 
    Btw. notice how simple the RDF model is when expressed in Scala
    
https://github.com/betehess/pimp-my-rdf/blob/master/core/src/main/scala/RDF.scala


Social Web Architect
http://bblfish.net/


Reply via email to