Re: more predictable Turtle output

Andy Seaborne Wed, 02 Aug 2017 06:44:58 -0700

See also JENA-1262

1/

If you want predicable output, you may be better off starting fromTurtleWriterBlocks, not the full pretty writer. Or the flat writerTurtleWriterFlat which is N-triples+prefixes. Depends on how extensivethe changes are (and how big the data is) as whether that's easier. Itpasses out chunks of same-subject triples.

In practice, all triples with the same subject come in one chunk becauseof indexing. Also, if the data does not change, I think all writers aredeterministic and produce the same output from run to run.

It's a balance of reducing the effort needed, prettiness, and stability.They are not independent choices!

2/

There is no need to change RIOTLib -- create your own writer andregister it. ExRIOT_out3 has an example of adding a writer.

3/

The pretty writer is in ShellGraph - accTriples is only used in a fewpoints and does not really drive the pretty writer output.Particularly, the order is lost because the collection of triples isfurther worked on, including sorting by predicate. SeewritePredicateObjectList.

Subclassing ShellGraph and overriding the methods likewritePredicateObjectList would be my approach. Too much is private toreally subclass - you'll need to copy the class at them moment. Alongwith registration, then at least you can have both in the same JVM andit helps testing.

4/

I have seen a writer (not open source) that applies a form ofFloyd-Walshall alorithm to sort subjects get some stability - connectednodes tend to come out together so localising git diffs. Quite spacehungry, quite complicated.


5/ The JSON-LD output comes from an external library.

    Andy

On 02/08/17 12:43, Élie Roux wrote:

Hello,

I'm currently trying to solve a problem I have in Turtle: I would like
my output to stay stable, so that it can live on a git without
generating too much diff noise every time the data is regenerated. One
example would be something like:

bdr:G844  a              :Place ;
         :placeContains   bdr:G1183 bdr:G229 bdr:G2CN10883 bdr:G3478
bdr:G3JT12502 bdr:G4885 .

for which I have no guarantee that the list will stay in the same order
if the same model is serialized again. I could turn it into a list:

bdr:G844  a              :Place ;
         :placeContains   ( bdr:G1183 bdr:G229 bdr:G2CN10883 bdr:G3478
bdr:G3JT12502 bdr:G4885 ) .

but that changes my data model, and I don't really need that, as I care
about the order only in serialized documents, not in the dataset itself.

I can hack the output of JSON-LD to do this kind of things, but with
Turtle this looks impossible.

I realize that Turtle doesn't guarantee order and I have no problem with
that. I'm also aware that introducing this kind of sorting will always
have caveats.

But I still think it would be a tremendous help for some users if this
kind of sorting was possible. The way I propose to do so is by
introducing the possibility for the user to provide a Comparator<Triple>
and optionally pass it to org.apache.jena.riot.system.RIOTLib, that
would change the behavior of accTriples() accordingly. That would allow
the current behavior not to change at all, and the new behavior to be
used only by users who would implement a Comparator<Triple> and thus
know what they're doing and what the limitation of this exercise are.

I'm ready to write the code if the idea is considered a good one, but
would like some opinion first. So what do you think?

Thank you,

Re: more predictable Turtle output

Reply via email to