On Fri, Nov 24, 2017 at 12:19 PM, Laura Morales <[email protected]> wrote:
> > What about simply deleting the old graph and loading the triples of the > > .nt file into the graph afterwards? I don't see any benefit of such a > > "tool" - you could just write your own bash script for this if you need > > this quite often. > > The advantage is with large graphs, such as wikidata. If I download their > dumps once a week, it's much more efficient to only change a few triples > instead of deleting the entire graph and recreating the whole TDB store. > Performing a diff between two graphs with blank nodes might be speed up using bloom filters. I have code that represents triples as bloom filters and I know that 9 byte filters will work for very large graphs so you could probably get aways with 8 bytes to make them fit in a standard integer size. This is a multiple pass operation. create a bloom filter for each node in graph A. Call this list A step through graph B creating bloom filters for each triple. if the triple in question has blank nodes only encode non blank nodes If the bloom filter is not in List A it is new. if the bloom filter is in list A then it may be new and a direct lookup in graph A. if it is not found add it If your filter list has a pointer to the triples that it represents (remember there can be bloom filter collisions) then you can rapidly determine if there is a match and you also have a good starting place to do blank node comparisons to determine if the triples are equivalent. If anyone is interested in trying this I have some triple/bloom filter code in my github repository. Claude -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
