DatsetGraph/Graph implementations are smart enough not to store duplicate 
tuples. So adding (let's say) a graph with 50 triples to a graph with 50 
triples, of which 25 are common between the two, should result in a graph with 
75 triples to be searched. On the other hand, a union graph between the two 
will have to search 100 triples. Is that what you mean? 

---
A. Soroka
The University of Virginia Library

> On Dec 21, 2016, at 8:13 AM, George News <[email protected]> wrote:
> 
> 
> On 21/12/2016 13:54, Andy Seaborne wrote:
>> 
>> 
>> On 21/12/16 12:31, George News wrote:
>>> Hi,
>>> 
>>> Today is the day of questions to the mailing list ;) Sorry for the
>>> "spam" ;)
>>> 
>>> I would like to know what is the internal implementation of the
>>> functions used for merging graphs.
>>> 
>>> 1) ModelFactory.createUnion(Model m1, Model m2)
>>> It seems from what I have read and inferred from some websites that
>>> there is not an actual copy of data on a new graph. It is more that
>>> internally the graph pointers (like in C) are linked, but the data is
>>> the original one and not copied. Is that right?
>> 
>> Correct - it is a new model that internally provides the union view of
>> two other models.
> 
> Great, no copy then ;)
> 
>>> 
>>> 2) org.apache.jena.graph.compose.MultiUnion
>>> How is the addGraph() works? Is it copying the original graph or it is
>>> just linking the data? I'm confused by the help : " Note that the
>>> requirement to remove duplicates from the union means that this will be
>>> an expensive operation for large (and especially for persistent)
>>> graphs. "
>> 
>> That comment is on find()
> 
> Upss my fault. You are completely right :(
> 
>> A graph is a set of triples - the key here is "set" - only one instance.
>> 
>> To make that appear to be true in the union, the code needs to remember
>> what it has iterated over.  if it is going (in extreme)
>> find(null,null,null)  that's a lot of space.
>> 
>> 
>> 
>>> Besides, how do I retrieved the merged/joint graph? Do I have to use
>>> option 1) in an iterative way, reusing the returned graph to add the
>>> additional one?
>> 
>> add(Model) copies the one model into another - a true merge.
> 
> That was what I thought. Now the confirmation from experts ;)
> 
>> 
>> from your previous question, you don't want this - you want TDB's
>> "default union graph" mode.  It's a lot cheaper at scale.
>> 
>> https://jena.apache.org/documentation/tdb/datasets.html
> 
> I already have that for the whole dataset. However I was thinking on
> creating smaller named graphs. In my mind, this is going to make SPARQL
> sentences and calls to Jena API quicker as the bunch of data where to
> search from is smaller. Is this right?
> 
> If it is I was thinking, based also on your response, to create a Model
> that is the union of all the ones I want (which should be quick), and
> the use this Model as the input for the SPARQL engine.
> 
> Besides, I was thinking also on having multiple datasets (TDB) but I
> don't now if that would make any sense.
> 
> The issue is that the amount of data that I will have to handle is quite
> huge, and I want as much as possible, to make the searchable sets the
> smaller possible.
> 
>>> 
>>> Thanks in advance for the help.
>>> Jorge
>>> 
>> 

Reply via email to