> He can correct me as needed, but it seems that Dick is using (and getting 
> great results from)
> an extension to Jena ("Mosaic") that federates different datasets (in this 
> cases from
> independent TDB instances) and runs queries over them in parallel. We've had 
> some discussions
> (all the way to a PR: https://github.com/apache/jena/pull/233) about getting 
> Mosaic into Jena's
> codebase, but we haven't quite managed to do it. I would love to move that 
> process forward.


I think his approach of splitting and running multiple tdbloaders works if 
every TDB is loaded into the default graph (using tdb:unionDefaultGraph). 
However I'm not sure if I want to maintain graph labels. Is there any way to 
tell Jena that one particular graph is "composed" of more than one TDB store? 
For example if I split Wikidata into smaller stores of 100M triples each, I 
could "SELECT FROM <wikidata>" instead of "SELECT FROM <wikidata-store1> 
<wikidata-store2> <wikidata-store3> ..."

Reply via email to