Re: Stable turtle serialization

Andy Seaborne Thu, 14 Jul 2022 02:34:29 -0700

Hi Sebastian,

On 14/07/2022 05:27, Sebastian Trueg wrote:

Hello,
trying to consistently get nice git diffs for my turtle files I want tonormalize pull requests. To that end I simply re-write the turtle filesvia RDFDataMgr. Sadly the result is not stable, at least the order ofsome objects changes from run to run.
Is there a way to ensure that serializing the same set of triples(parsed from different formats) always results in the exact same output?

Not currently. It would be nice to have and there are a few around butno contributions made.


The W3C "RDF Dataset Canonicalization and Hash Working Group"
  https://w3c.github.io/rch-wg-charter/
is about to start.


A derivative of the Turtle blocks format would be a good starting point.

Or do you also want all the "pretty" forms, like nested [ ] in theobject position? Lists "(....)" - "usually same output" using the coreof the pretty printer class ShellGraph (specifically listSubjects()).


Contributions welcome.

Having nested [ ] means a small change in the graph can lead to a bigchange in the output.

The fun begins with blank nodes - reparsing the same file is a differentgraph. Changes to blank node labels change hash tables. So changes toblank nodes change the iteration order of everything in an index.

There is work-in-progress on a new memory graph implementation, focusedon speed and memory efficiency. That doesn't mean we can also haveanother graph implementation that has a consistent return order forGraph.find().


    Andy

Output from a TDB database is consistent until the next update occurs.

I found https://github.com/buda-base/jena-stable-turtle but that seemsto not be compatible with 4.5.0.
Thanks,
Sebastian

Re: Stable turtle serialization

Reply via email to