Ivan Herman wrote:
Sigh. This is indeed a slightly muddy area where the RDF concept
document should be written differently. But, well, this is not something
either of these two working groups can do...
I think the issue is that the RDF concept spec describes the abstract
concepts for abstract RDF graphs, and not a serialization thereof. [...]
As I understand it, rdf-concepts explicitly describes the lexical space
of XMLLiterals, i.e. the set of Unicode strings which values of type
XMLLiteral must be a member of.
I'm happy to agree that serialisations like RDF/XML and RDFa specify
their own transformations/mappings from the input document onto that
abstract RDF lexical space, and there's no need for the input document
to care about C14N at all - the input can be anything, and the mapping
can be arbitrarily complicated, as long as the resultant triples contain
values from the appropriate lexical space.
But serialisations of RDF like N3/Turtle/N-Triples represent XMLLiterals
as typed strings. I'm making the (hopefully reasonable) assumption that
those strings correspond directly (after appropriate charset decoding)
to the lexical space defined by rdf-concepts - there is no non-trivial
mapping there. (In particular, no automatic canonicalisation occurs.)
(If that assumption is wrong, and there is a non-trivial mapping between
N3/Turtle/N-Triples serialised strings and the XMLLiteral lexical space,
then I can't find any definition of that mapping at all, which is a
bigger problem (unless I'm just missing it).)
The RDFa spec examples and test cases represent triples using
Turtle/N-Triples as the serialisation format, so their strings map
directly onto the restricted lexical space, so I believe those
particular cases need to use canonicalised form for their serialisations
of XMLLiteral strings.
The RDFa spec also refers to abstract triples (as the result of
processing a document), at which point there is no serialisation
involved at all, and so a value of type XMLLiteral must be a member of
the lexical space of XMLLiteral, i.e. must be a canonical-form string.
So I think I agree with everything you are saying (that RDF/XML and RDFa
don't require c14n of their input) and I think that's all good, but I
don't think that's addressing the problems I see (which are with the
abstract triple output of RDFa, and with specific examples of
Turtle/N-Triples serialised triples).
(On a practical level, all RDF environments and serializations I know
about behave similarly: they would take any (valid) XML as XML Literal,
and the C14N comes into the picture when two XML literals are checked,
eg, for equality.)
(If equality is always checked in terms of C14N-equivalence, why does
http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql
say that the output must equal either one of two strings that are
C14N-equivalent? If it's equal to one, it would also be equal to the
other. So I presume at least some implementations just do simple string
equality, instead of dealing with C14N when checking equality, and the
C14N should be dealt with at an earlier point (when generating the
triples) to avoid making equality comparisons hopelessly inefficient.)
Ivan
--
Philip Taylor
pj...@cam.ac.uk