See also JENA-1262

1/
If you want predicable output, you may be better off starting from TurtleWriterBlocks, not the full pretty writer. Or the flat writer TurtleWriterFlat which is N-triples+prefixes. Depends on how extensive the changes are (and how big the data is) as whether that's easier. It passes out chunks of same-subject triples.

In practice, all triples with the same subject come in one chunk because of indexing. Also, if the data does not change, I think all writers are deterministic and produce the same output from run to run.

It's a balance of reducing the effort needed, prettiness, and stability. They are not independent choices!

2/
There is no need to change RIOTLib -- create your own writer and register it. ExRIOT_out3 has an example of adding a writer.

3/
The pretty writer is in ShellGraph - accTriples is only used in a few points and does not really drive the pretty writer output. Particularly, the order is lost because the collection of triples is further worked on, including sorting by predicate. See writePredicateObjectList.

Subclassing ShellGraph and overriding the methods like writePredicateObjectList would be my approach. Too much is private to really subclass - you'll need to copy the class at them moment. Along with registration, then at least you can have both in the same JVM and it helps testing.

4/
I have seen a writer (not open source) that applies a form of Floyd-Walshall alorithm to sort subjects get some stability - connected nodes tend to come out together so localising git diffs. Quite space hungry, quite complicated.

5/ The JSON-LD output comes from an external library.

    Andy

On 02/08/17 12:43, Élie Roux wrote:
Hello,

I'm currently trying to solve a problem I have in Turtle: I would like
my output to stay stable, so that it can live on a git without
generating too much diff noise every time the data is regenerated. One
example would be something like:

bdr:G844  a              :Place ;
         :placeContains   bdr:G1183 bdr:G229 bdr:G2CN10883 bdr:G3478
bdr:G3JT12502 bdr:G4885 .

for which I have no guarantee that the list will stay in the same order
if the same model is serialized again. I could turn it into a list:

bdr:G844  a              :Place ;
         :placeContains   ( bdr:G1183 bdr:G229 bdr:G2CN10883 bdr:G3478
bdr:G3JT12502 bdr:G4885 ) .

but that changes my data model, and I don't really need that, as I care
about the order only in serialized documents, not in the dataset itself.

I can hack the output of JSON-LD to do this kind of things, but with
Turtle this looks impossible.

I realize that Turtle doesn't guarantee order and I have no problem with
that. I'm also aware that introducing this kind of sorting will always
have caveats.

But I still think it would be a tremendous help for some users if this
kind of sorting was possible. The way I propose to do so is by
introducing the possibility for the user to provide a Comparator<Triple>
and optionally pass it to org.apache.jena.riot.system.RIOTLib, that
would change the behavior of accTriples() accordingly. That would allow
the current behavior not to change at all, and the new behavior to be
used only by users who would implement a Comparator<Triple> and thus
know what they're doing and what the limitation of this exercise are.

I'm ready to write the code if the idea is considered a good one, but
would like some opinion first. So what do you think?

Thank you,

Reply via email to