On 25/09/2023 15:35, Arne Bernhardt wrote:
Hello,
in order to use the GraphMem2 graphs in Jena 4.9, we are planning to switch
to "literal term equality" in our projects.
Currently we are discussing the following two approaches:
1. simple RDF standard compatibility.
We treat object literal nodes like any other node. The term representation
is always preserved, and users of our API only need to know the RDF
standards.
Anyone inserting "true"^^boolean needs to know that this is not the same
(term) as "1"^^boolean.
2. uniform value representations
All incoming data is canonicalised / normalised.
Users of our API just need to know that if they enter "1"^^boolean, they
will get back "true"^^boolean.
Users don't often realise "1"^^xsd;boolean is legal. Ditto for canonical
integers.
From what I have seen, it is unusual for users to write these
non-canonical forms, even for integers.
Should or can we use some of the classes in the jena project for this
purpose?
(like org.apache.jena.riot.process.normalize.CanonicalizeLiteral,
*.NormalizeValue and/or *.NormalizeValue2)
That would work. There are StreamRDF ways to apply the transformation.
The parser framework has RDFParserBuilder.canonicalLiterals(true).
Do you have any opinion on the two approaches?
Just information for the general reader:
TDB, for other reasons, canonicalises XSD number, date/time and boolean
literals.
Andy
Regards
Arne