> He can correct me as needed, but it seems that Dick is using (and getting
> great results from)
> an extension to Jena ("Mosaic") that federates different datasets (in this
> cases from
> independent TDB instances) and runs queries over them in parallel. We've had
> some discussions
> (all the way to a PR: https://github.com/apache/jena/pull/233) about getting
> Mosaic into Jena's
> codebase, but we haven't quite managed to do it. I would love to move that
> process forward.
I think his approach of splitting and running multiple tdbloaders works if
every TDB is loaded into the default graph (using tdb:unionDefaultGraph).
However I'm not sure if I want to maintain graph labels. Is there any way to
tell Jena that one particular graph is "composed" of more than one TDB store?
For example if I split Wikidata into smaller stores of 100M triples each, I
could "SELECT FROM <wikidata>" instead of "SELECT FROM <wikidata-store1>
<wikidata-store2> <wikidata-store3> ..."