On 04/02/2022 19:09, Shaw, Ryan wrote:
Hello,

I am trying to experiment with generating diffable N-Triples or flat Turtle 
files.
...
Thanks,
Ryan


Info: There is work on a charter for

"RDF Dataset Canonicalization and Hash Working Group"

https://w3c.github.io/rch-wg-charter/

The end of section 1 has some links to related work.

Given RDF is inherently unordered, canonicalization and "diff of triples" are related.


For diff-able files, what counts as "different" between two files?

Instead of changing the bnode algorithm, have you considered making use of bnode-isomorphism? That is, during a diff, maintain a growing mapping from bnodes in one list of triples to bnodes in the other list?
Iso.isomorphicTriples

(The list being the triples in encounter order during parsing). It is working not so much on the syntax as the abstraction of triples. e.g A Turtle file and an NT file produced by parsing the TTL file can be defined to be "the same".

It's fairly portable across files generated by other systems as well except for Turtle lists - Jena as a fixed order for triple generation for a list but it isn't necesasrily the same for all systems.

Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, with rdf:first, then rdf:rest; the triple the referencing the list appears after the list. It happens to be the way the spec explains it:
   https://www.w3.org/TR/turtle/#sec-parsing-triples
but that is defining the outcome and isn't a requirement.

    Andy

Reply via email to