On 04/02/2022 19:09, Shaw, Ryan wrote:
Hello,
I am trying to experiment with generating diffable N-Triples or flat Turtle
files.
...
Thanks,
Ryan
Info: There is work on a charter for
"RDF Dataset Canonicalization and Hash Working Group"
https://w3c.github.io/rch-wg-charter/
The end of section 1 has some links to related work.
Given RDF is inherently unordered, canonicalization and "diff of
triples" are related.
For diff-able files, what counts as "different" between two files?
Instead of changing the bnode algorithm, have you considered making use
of bnode-isomorphism? That is, during a diff, maintain a growing
mapping from bnodes in one list of triples to bnodes in the other list?
Iso.isomorphicTriples
(The list being the triples in encounter order during parsing). It is
working not so much on the syntax as the abstraction of triples. e.g A
Turtle file and an NT file produced by parsing the TTL file can be
defined to be "the same".
It's fairly portable across files generated by other systems as well
except for Turtle lists - Jena as a fixed order for triple generation
for a list but it isn't necesasrily the same for all systems.
Jena's Turtle algorithm, which is in LangTurtleBase, generates in list
order, with rdf:first, then rdf:rest; the triple the referencing the
list appears after the list. It happens to be the way the spec explains it:
https://www.w3.org/TR/turtle/#sec-parsing-triples
but that is defining the outcome and isn't a requirement.
Andy