Thank you, Andy. I agree that working on the triple level is the correct way to approach this. I was looking for something quick and dirty that would work with textual diffing by a VCS, hence my focus on the blank node labels.
Are there any examples of how to use the isomorphism utilities in Jena? > On Feb 5, 2022, at 12:48 PM, Andy Seaborne <[email protected]> wrote: > > > > On 04/02/2022 19:09, Shaw, Ryan wrote: >> Hello, >> I am trying to experiment with generating diffable N-Triples or flat Turtle >> files. > ... >> Thanks, >> Ryan > > > Info: There is work on a charter for > > "RDF Dataset Canonicalization and Hash Working Group" > > https://w3c.github.io/rch-wg-charter/ > > The end of section 1 has some links to related work. > > Given RDF is inherently unordered, canonicalization and "diff of triples" are > related. > > > For diff-able files, what counts as "different" between two files? > > Instead of changing the bnode algorithm, have you considered making use of > bnode-isomorphism? That is, during a diff, maintain a growing mapping from > bnodes in one list of triples to bnodes in the other list? > Iso.isomorphicTriples > > (The list being the triples in encounter order during parsing). It is working > not so much on the syntax as the abstraction of triples. e.g A Turtle file > and an NT file produced by parsing the TTL file can be defined to be "the > same". > > It's fairly portable across files generated by other systems as well except > for Turtle lists - Jena as a fixed order for triple generation for a list but > it isn't necesasrily the same for all systems. > > Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, > with rdf:first, then rdf:rest; the triple the referencing the list appears > after the list. It happens to be the way the spec explains it: > https://www.w3.org/TR/turtle/#sec-parsing-triples > but that is defining the outcome and isn't a requirement. > > Andy
