On 17/02/15 17:39, Stian Soiland-Reyes wrote:
RDFDataMgr should be faster, but would be used by model.read() anyway after
its initialization.
Same speed - same code.
It is also possible to do streaming reads, where you get each triple as it
is read (and no Model). Combined with streaming writes this is the fastest
way to do format conversions (e.g. RDF/XML to Turtle).
http://jena.apache.org/documentation/io/ for details:
There is a command line tool "riot" which takes a --out argument to
stream convert formats.
(the next version has --formatted for pretty output but that needs to
read into a model and out again - not streaming)
If you need to shuffle RDF between different Jena instances you might like
the binary Thrift-based format which can be significantly faster and more
compact than the rest.
Faster, not more compact.
For fast and compact, use rdf thrift and gzip.
What kind of Model are you reading into? Memory or disk-based? For tdb
there is the tdbloader and (on Linux/unix) tdbloader2 utilities you can run
from the command line.
Yes - the big advantge of loading a database is that you only parse the
data once on loading so if you make many queries, you don't have to wait
for the parse step.
Andy
Apologies for not having any links above as I am posting from mobile, I am
sure others will fill me in where needed.
On 17 Feb 2015 15:18, "Marco Tenti" <[email protected]> wrote:
Hi, everyone, i have two file of triples (order of GB) , Jena provides many
methods to read and write files triple, I'm wondering what is the best
methods for read/write the models of jena, In terms of speed and memory.
For example, in reading phase is better use
org.apache.jena.riot.RDFDataMgr.read(...) or
com.hp.hpl.jena.rdf.model.Model model -> model.read(...) ?
Ty in advance, Greetings.