DatsetGraph/Graph implementations are smart enough not to store duplicate tuples. So adding (let's say) a graph with 50 triples to a graph with 50 triples, of which 25 are common between the two, should result in a graph with 75 triples to be searched. On the other hand, a union graph between the two will have to search 100 triples. Is that what you mean?
--- A. Soroka The University of Virginia Library > On Dec 21, 2016, at 8:13 AM, George News <[email protected]> wrote: > > > On 21/12/2016 13:54, Andy Seaborne wrote: >> >> >> On 21/12/16 12:31, George News wrote: >>> Hi, >>> >>> Today is the day of questions to the mailing list ;) Sorry for the >>> "spam" ;) >>> >>> I would like to know what is the internal implementation of the >>> functions used for merging graphs. >>> >>> 1) ModelFactory.createUnion(Model m1, Model m2) >>> It seems from what I have read and inferred from some websites that >>> there is not an actual copy of data on a new graph. It is more that >>> internally the graph pointers (like in C) are linked, but the data is >>> the original one and not copied. Is that right? >> >> Correct - it is a new model that internally provides the union view of >> two other models. > > Great, no copy then ;) > >>> >>> 2) org.apache.jena.graph.compose.MultiUnion >>> How is the addGraph() works? Is it copying the original graph or it is >>> just linking the data? I'm confused by the help : " Note that the >>> requirement to remove duplicates from the union means that this will be >>> an expensive operation for large (and especially for persistent) >>> graphs. " >> >> That comment is on find() > > Upss my fault. You are completely right :( > >> A graph is a set of triples - the key here is "set" - only one instance. >> >> To make that appear to be true in the union, the code needs to remember >> what it has iterated over. if it is going (in extreme) >> find(null,null,null) that's a lot of space. >> >> >> >>> Besides, how do I retrieved the merged/joint graph? Do I have to use >>> option 1) in an iterative way, reusing the returned graph to add the >>> additional one? >> >> add(Model) copies the one model into another - a true merge. > > That was what I thought. Now the confirmation from experts ;) > >> >> from your previous question, you don't want this - you want TDB's >> "default union graph" mode. It's a lot cheaper at scale. >> >> https://jena.apache.org/documentation/tdb/datasets.html > > I already have that for the whole dataset. However I was thinking on > creating smaller named graphs. In my mind, this is going to make SPARQL > sentences and calls to Jena API quicker as the bunch of data where to > search from is smaller. Is this right? > > If it is I was thinking, based also on your response, to create a Model > that is the union of all the ones I want (which should be quick), and > the use this Model as the input for the SPARQL engine. > > Besides, I was thinking also on having multiple datasets (TDB) but I > don't now if that would make any sense. > > The issue is that the amount of data that I will have to handle is quite > huge, and I want as much as possible, to make the searchable sets the > smaller possible. > >>> >>> Thanks in advance for the help. >>> Jorge >>> >>
