On 06/08/15 11:00, Damian Steer wrote:
On 6 Aug 2015, at 10:13, Marko Pance <[email protected]> wrote:
I'm sorry, I don't quite understand your response. Are you saying if I'd like
to stream I should try using a format other than rdf/xml? In oder to do so,
could I use the command of:
bin/riot --out rdfthrift ~/Downloads/chembl_20.0_molecule.ttl >
~/Downloads/chembl_20.0_molecule.rdf
Essentially yes.
My reading of ‘—stream’ is that it is the same as ‘—out’ but with the
additional requirement that the format should support streaming.
What would go in place of "rdfthrift”?
Depends on what you want to do with the result. If you want rdf/xml then you
can’t stream (currently). If you want another format, well, you already have
turtle of course.
ntriples is a solid format generally.
Damian
Yes - that's about it.
--stream guarantees a streaming setup or says "no"
--out will stream if possible but will fall back to non-streaming
--pretty always chooses pretty.
for example, Turtle can be printed subject-block-by-subject-block
(streaming) or more pretty with embedded bnodes and lists. Some of the
pretty forms require looking through the data before starting to print.
RDF/XML, especially the RDF/XML-ABBREV is very much in the latter
category as well. It requires looking in the data first for striping,
for lists and sorting out namespace for properties. Even Jena's Basic
RDF/XML is non-streaming.
There could be a streaming RDF/XML (per triple or per subject block) but
it's going to look ugly. No RDF/XML striping (nested triples like bnode
objects), namespace attributes will need to be written on each block.
I agree with Damian - at scale N-Triples, N-Quads are useful.
(Compressed if necessary.) Everything supports them.
Andy