I’m writing an ETL-ish utility that extracts triples from some directories of 
application-specific XML to assemble into a from-scratch TDB database. Of 
course I want to take advantage of the bulk loader facilities for best results. 
The TDBLoader methods that I’m looking at all accept InputStreams or URIs from 
which to get serialized RDF. It happens that I am already using Jena to 
transform the XML into RDF, so I’ve got actual Jena Triples in hand when I come 
to the bulk loading apparatus. It seems silly to serialize the triples only for 
the bulk loader to deserialize them, so I’d like to get at a StreamRDF instance 
or something similar that I can use to give Triples in a flow directly to the 
bulk loader, but at a first glance it looks like that’s hidden as 
BulkLoader.DestinationGraphs.

As additional context, the extraction is easily parallelized, but I do not see 
any note that the bulk loading is threadsafe, so I had intended to run a couple 
of threads of extraction loading a queue with a thread feeding the bulk loading 
gear from that queue.

Am I misunderstanding the action of the bulk loader, and more to the point, 
what is the most efficient way I can build a from-scratch TDB database from 
Triples?

Thanks for any help or advice!

---
A. Soroka
The University of Virginia Library

Reply via email to