On 21/12/2016 13:54, Andy Seaborne wrote: > > > On 21/12/16 12:31, George News wrote: >> Hi, >> >> Today is the day of questions to the mailing list ;) Sorry for the >> "spam" ;) >> >> I would like to know what is the internal implementation of the >> functions used for merging graphs. >> >> 1) ModelFactory.createUnion(Model m1, Model m2) >> It seems from what I have read and inferred from some websites that >> there is not an actual copy of data on a new graph. It is more that >> internally the graph pointers (like in C) are linked, but the data is >> the original one and not copied. Is that right? > > Correct - it is a new model that internally provides the union view of > two other models.
Great, no copy then ;) >> >> 2) org.apache.jena.graph.compose.MultiUnion >> How is the addGraph() works? Is it copying the original graph or it is >> just linking the data? I'm confused by the help : " Note that the >> requirement to remove duplicates from the union means that this will be >> an expensive operation for large (and especially for persistent) >> graphs. " > > That comment is on find() Upss my fault. You are completely right :( > A graph is a set of triples - the key here is "set" - only one instance. > > To make that appear to be true in the union, the code needs to remember > what it has iterated over. if it is going (in extreme) > find(null,null,null) that's a lot of space. > > > >> Besides, how do I retrieved the merged/joint graph? Do I have to use >> option 1) in an iterative way, reusing the returned graph to add the >> additional one? > > add(Model) copies the one model into another - a true merge. That was what I thought. Now the confirmation from experts ;) > > from your previous question, you don't want this - you want TDB's > "default union graph" mode. It's a lot cheaper at scale. > > https://jena.apache.org/documentation/tdb/datasets.html I already have that for the whole dataset. However I was thinking on creating smaller named graphs. In my mind, this is going to make SPARQL sentences and calls to Jena API quicker as the bunch of data where to search from is smaller. Is this right? If it is I was thinking, based also on your response, to create a Model that is the union of all the ones I want (which should be quick), and the use this Model as the input for the SPARQL engine. Besides, I was thinking also on having multiple datasets (TDB) but I don't now if that would make any sense. The issue is that the amount of data that I will have to handle is quite huge, and I want as much as possible, to make the searchable sets the smaller possible. >> >> Thanks in advance for the help. >> Jorge >> >
