Hi Andy,

thank you for the explanations. I, indeed, want the pretty formatting. Bu I actually hacked together a solution for my needs in the meantime.

I simply use a wrapper Graph which sorts all triples.
To include stable blank node sorting I sort blank nodes by comparing the list of triples which have them as a subject. This is not performant, uses too much memory, might be prone to endless looping (if one chooses to use bnodes in a weird way) and does not solve the problem of bnode renaming.
But it is good enough for me for now.

Sadly it is certainly not good enough for contribution. :/

Regards,
Sebastian

On 14.07.22 11:34, Andy Seaborne wrote:
Hi Sebastian,

On 14/07/2022 05:27, Sebastian Trueg wrote:
Hello,

trying to consistently get nice git diffs for my turtle files I want to normalize pull requests. To that end I simply re-write the turtle files via RDFDataMgr. Sadly the result is not stable, at least the order of some objects changes from run to run.

Is there a way to ensure that serializing the same set of triples (parsed from different formats) always results in the exact same output?

Not currently. It would be nice to have and there are a few around but no contributions made.

The W3C "RDF Dataset Canonicalization and Hash Working Group"
   https://w3c.github.io/rch-wg-charter/
is about to start.


A derivative of the Turtle blocks format would be a good starting point.
Or do you also want all the "pretty" forms, like nested [ ] in the object position? Lists "(....)" - "usually same output" using the core of the pretty printer class ShellGraph (specifically listSubjects()).

Contributions welcome.


Having nested [ ] means a small change in the graph can lead to a big change in the output.

The fun begins with blank nodes - reparsing the same file is a different graph. Changes to blank node labels change hash tables. So changes to blank nodes change the iteration order of everything in an index.

There is work-in-progress on a new memory graph implementation, focused on speed and memory efficiency.  That doesn't mean we can also have another graph implementation that has a consistent return order for Graph.find().

     Andy

Output from a TDB database is consistent until the next update occurs.


I found https://github.com/buda-base/jena-stable-turtle but that seems to not be compatible with 4.5.0.

Thanks,
Sebastian



Reply via email to