So, this is what I was asking about earlier. With small graphs, e.g. DESCRIBE <....>, the algorithms for graph isomorphism that support blank nodes should be good. rdflib includes an implementation, and I wish I knew whether there is an implementation of that digest algorithm for Jena.
On Fri, Dec 8, 2017 at 2:27 AM, Claude Warren <[email protected]> wrote: > On Fri, Nov 24, 2017 at 12:19 PM, Laura Morales <[email protected]> wrote: > > > > What about simply deleting the old graph and loading the triples of the > > > .nt file into the graph afterwards? I don't see any benefit of such a > > > "tool" - you could just write your own bash script for this if you need > > > this quite often. > > > > The advantage is with large graphs, such as wikidata. If I download their > > dumps once a week, it's much more efficient to only change a few triples > > instead of deleting the entire graph and recreating the whole TDB store. > > > > > Performing a diff between two graphs with blank nodes might be speed up > using bloom filters. > > I have code that represents triples as bloom filters and I know that 9 byte > filters will work for very large graphs so you could probably get aways > with 8 bytes to make them fit in a standard integer size. > > This is a multiple pass operation. > > create a bloom filter for each node in graph A. Call this list A > > step through graph B creating bloom filters for each triple. if the triple > in question has blank nodes only encode non blank nodes > > If the bloom filter is not in List A it is new. > > if the bloom filter is in list A then it may be new and a direct lookup in > graph A. if it is not found add it > > If your filter list has a pointer to the triples that it represents > (remember there can be bloom filter collisions) then you can rapidly > determine if there is a match and you also have a good starting place to do > blank node comparisons to determine if the triples are equivalent. > > If anyone is interested in trying this I have some triple/bloom filter code > in my github repository. > > Claude > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren >
