In terms of UNIX utilities, there's a command called "comm" which outputs three columns: * lines only in the first file (column 1) * lines only in the second file (column 2) * lines in common (column 3)
Then arguments can suppress columns: * comm -23 a b - will show lines only in a * comm -13 a b - will show lines only in b Of course checksums would not work on the whole graph, but on a sub-graph defined by a DESCRIBE query, e.g. one subject aka owl:Thing, it could be perfectly feasible. Especially because you are essentially comparing a graph digest and do not need to load the data. On Fri, Nov 24, 2017 at 10:02 AM, Osma Suominen <[email protected]> wrote: > Dan Davis kirjoitti 24.11.2017 klo 16:53: > >> Rdflib has a graph_diff method that returns common, triples, only in left, >> only in right. It is in IsonorpgicGraph class, so it should handle blank >> nodes. >> > > Good luck running that on something like Wikidata though. It's far too big > to fit in memory. > > I'd use N-Triple files (old and new) sorted using the unix command sort, > then use diff to determine added and removed triples, and finally turn > those into INSERT DATA and DELETE DATA update operations. Assuming there > are no blank nodes. > > -Osma > > (speaking as the author of the current rdflib in-memory store, IOMemory) > > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaiku > <https://maps.google.com/?q=x+26+(Kaiku&entry=gmail&source=g>katu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > [email protected] > http://www.nationallibrary.fi >
