On 09/02/2022 16:09, Shaw, Ryan wrote:
Thank you, Andy.

I agree that working on the triple level is the correct way to approach this. I 
was looking for something quick and dirty that would work with textual diffing 
by a VCS, hence my focus on the blank node labels.

Are there any examples of how to use the isomorphism utilities in Jena?

See the code - the isomorphism code takes two groups of triples in various grouping forms and returns true or false. You'll probably want to look at how it does it and build similar for your use case to get a diff of triples.


On Feb 5, 2022, at 12:48 PM, Andy Seaborne <a...@apache.org> wrote:

On 04/02/2022 19:09, Shaw, Ryan wrote:
I am trying to experiment with generating diffable N-Triples or flat Turtle 

Info: There is work on a charter for

"RDF Dataset Canonicalization and Hash Working Group"


The end of section 1 has some links to related work.

Given RDF is inherently unordered, canonicalization and "diff of triples" are 

For diff-able files, what counts as "different" between two files?

Instead of changing the bnode algorithm, have you considered making use of 
bnode-isomorphism? That is, during a diff, maintain a growing mapping from 
bnodes in one list of triples to bnodes in the other list?

(The list being the triples in encounter order during parsing). It is working not so much 
on the syntax as the abstraction of triples. e.g A Turtle file and an NT file produced by 
parsing the TTL file can be defined to be "the same".

It's fairly portable across files generated by other systems as well except for 
Turtle lists - Jena as a fixed order for triple generation for a list but it 
isn't necesasrily the same for all systems.

Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, 
with rdf:first, then rdf:rest; the triple the referencing the list appears 
after the list. It happens to be the way the spec explains it:
but that is defining the outcome and isn't a requirement.


Reply via email to