In terms of UNIX utilities, there's a command called "comm" which outputs
three columns:
* lines only in the first file (column 1)
* lines only in the second file (column 2)
* lines in common (column 3)

Then arguments can suppress columns:
* comm -23 a b  - will show lines only in a
* comm -13 a b - will show lines only in b

Of course checksums would not work on the whole graph, but on a sub-graph
defined by a DESCRIBE query, e.g. one subject aka owl:Thing, it could be
perfectly feasible.  Especially because you are essentially comparing a
graph digest and do not need to load the data.



On Fri, Nov 24, 2017 at 10:02 AM, Osma Suominen <[email protected]>
wrote:

> Dan Davis kirjoitti 24.11.2017 klo 16:53:
>
>> Rdflib has a graph_diff method that returns common, triples, only in left,
>> only in right.   It is in IsonorpgicGraph class, so it should handle blank
>> nodes.
>>
>
> Good luck running that on something like Wikidata though. It's far too big
> to fit in memory.
>
> I'd use N-Triple files (old and new) sorted using the unix command sort,
> then use diff to determine added and removed triples, and finally turn
> those into INSERT DATA and DELETE DATA update operations. Assuming there
> are no blank nodes.
>
> -Osma
>
> (speaking as the author of the current rdflib in-memory store, IOMemory)
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaiku
> <https://maps.google.com/?q=x+26+(Kaiku&entry=gmail&source=g>katu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> [email protected]
> http://www.nationallibrary.fi
>

Reply via email to