Re: Suggestion for Read/Write very large file of triples

Andy Seaborne Tue, 17 Feb 2015 12:24:37 -0800

On 17/02/15 17:39, Stian Soiland-Reyes wrote:

RDFDataMgr should be faster, but would be used by model.read() anyway after
its initialization.


Same speed - same code.


It is also possible to do streaming reads,  where you get each triple as it
is read (and no Model). Combined with streaming writes this is the fastest
way to do format conversions (e.g. RDF/XML to Turtle).


http://jena.apache.org/documentation/io/ for details:

There is a command line tool "riot" which takes a --out argument tostream convert formats.

(the next version has --formatted for pretty output but that needs toread into a model and out again - not streaming)

If you need to shuffle RDF between different Jena instances you might like
the binary Thrift-based format which can be significantly faster and more
compact than the rest.


Faster, not more compact.

For fast and compact, use rdf thrift and gzip.

What kind of Model are you reading into? Memory or disk-based? For tdb
there is the tdbloader and (on Linux/unix) tdbloader2 utilities you can run
from the command line.

Yes - the big advantge of loading a database is that you only parse thedata once on loading so if you make many queries, you don't have to waitfor the parse step.


        Andy


Apologies for not having any links above as I am posting from mobile, I am
sure others will fill me in where needed.
On 17 Feb 2015 15:18, "Marco Tenti" <[email protected]> wrote:

Hi, everyone, i have two file of triples (order of GB) , Jena provides many
methods to read and write files triple, I'm wondering what is the best
methods for read/write the models of jena, In terms of speed and memory.
For example, in reading phase is better use
org.apache.jena.riot.RDFDataMgr.read(...)     or
com.hp.hpl.jena.rdf.model.Model model -> model.read(...) ?

Ty in advance, Greetings.

Re: Suggestion for Read/Write very large file of triples

Reply via email to