At which point it would seem to be the responsibility of that project to ensure 
that there are no collisions, because that's not a couple of RDF graphs, that's 
a single RDF graph in several pieces. If they reuse bnode identifiers for 
different nodes in different file-chunks, that is incorrect serialization, from 
the POV of the _whole_ graph (not from the POV of subgraphs).

Either concatenate the files while manually swapping out for new non-colliding 
identifiers (perhaps use UUIDs), or use the multi-file tdbloader idiom that 
Andy mentioned (which works with many of the Jena CLI tools, actually).

Introducing new identifiers for bnodes to avoid collisions is pretty standard 
fare. The RDF Semantics document:

https://www.w3.org/TR/rdf11-mt/#shared-blank-nodes-unions-and-merges
https://www.w3.org/TR/rdf11-mt/#dfn-standardize

gives a really clear explanation.

Adam Soroka

> On Dec 26, 2017, at 3:27 PM, Laura Morales <[email protected]> wrote:
> 
>> Blank node identifiers are only limited in scope to a serialization of a
>> particular RDF graph, i.e. the node _:b does not represent the same node as
>> a node named _:b in any other graph.
> 
> Yes I understand this, but I've seen some projects distribute their data as 
> one graph split into multiple files (eg one file per item).

Reply via email to