Re: Stable turtle serialization

Sebastian Trueg Fri, 15 Jul 2022 02:33:48 -0700

Hi Andy,

thank you for the explanations. I, indeed, want the pretty formatting.Bu I actually hacked together a solution for my needs in the meantime.


I simply use a wrapper Graph which sorts all triples.

To include stable blank node sorting I sort blank nodes by comparing thelist of triples which have them as a subject. This is not performant,uses too much memory, might be prone to endless looping (if one choosesto use bnodes in a weird way) and does not solve the problem of bnoderenaming.

But it is good enough for me for now.

Sadly it is certainly not good enough for contribution. :/

Regards,
Sebastian

On 14.07.22 11:34, Andy Seaborne wrote:

Hi Sebastian,

On 14/07/2022 05:27, Sebastian Trueg wrote:
Hello,
trying to consistently get nice git diffs for my turtle files I wantto normalize pull requests. To that end I simply re-write the turtlefiles via RDFDataMgr. Sadly the result is not stable, at least theorder of some objects changes from run to run.
Is there a way to ensure that serializing the same set of triples(parsed from different formats) always results in the exact same output?
Not currently. It would be nice to have and there are a few around butno contributions made.
The W3C "RDF Dataset Canonicalization and Hash Working Group"
   https://w3c.github.io/rch-wg-charter/
is about to start.


A derivative of the Turtle blocks format would be a good starting point.
Or do you also want all the "pretty" forms, like nested [ ] in theobject position? Lists "(....)" - "usually same output" using the coreof the pretty printer class ShellGraph (specifically listSubjects()).
Contributions welcome.
Having nested [ ] means a small change in the graph can lead to a bigchange in the output.
The fun begins with blank nodes - reparsing the same file is a differentgraph. Changes to blank node labels change hash tables. So changes toblank nodes change the iteration order of everything in an index.
There is work-in-progress on a new memory graph implementation, focusedon speed and memory efficiency. That doesn't mean we can also haveanother graph implementation that has a consistent return order forGraph.find().
     Andy

Output from a TDB database is consistent until the next update occurs.
I found https://github.com/buda-base/jena-stable-turtle but that seemsto not be compatible with 4.5.0.
Thanks,
Sebastian

Re: Stable turtle serialization

Reply via email to