On 02/08/17 13:31, Élie Roux wrote:
Le 02/08/2017 à 14:13, Jean-Marc Vanel a écrit :
Élie,
I would use N-Triples format, sorted in alphanumerical order.
Thank you very much for your answer! I thought about this approach but I
see two problems:
- NTRIPLE is hardly readable and I would prefer having my data stored as
TURTLE for readability
- more importantly, this will still output a lot of diff noise because
blank node IDs will change randomly (and will not keep the same order)
Only if you reload the file ... in which case it is a different blank node.
The NT writer uses the internal label for the blank node so if the blank
node label is changing, suggesting the file is reloaded.
This is most serious for subjects because they will be wildly far apart
whereas (block writer) triples are locally grouped. Sorting by subject
would need to define the comparison based on something - maybe a primary
key value?
Dumping a TDB database (which is N-Quads) shows he label is stable if
the source is stable.
Andy
Thank you,