On Sep 22, 2009, at 3:42 PM, Mark Birbeck wrote:
HI Jonas,
It certainly matters. If for example if method 1 or 2 were used then
no prefix mappings would be found at all in the DOM output from a
HTML
parser. So it really *does* matter how you do prefix mapping. And as
far as DOM 2 goes, I think 1 or 2 are the intuitive solutions so if
we're not using those then I *really* think it's important to specify
so.
In any case, I think I've spent enough time on this issue. I can't
really articulate the problem any more than I have. I hope this issue
is solved by the time last call rolls around.
I see that you are frustrated, but you seem to think that the issue is
that no-one understands your position.
We *do* understand your position, and are trying to explain to you,
that -- with all due respect -- it is based on a misunderstanding.
You are looking at implementation specifics, and as many people have
explained, implementation is not the issue. This is because the spec
is defining an algorithm, which entitles people to implement things
how they see fit, on whatever platform they want to write for, using
whatever language they want to use.
What Jonas is saying is that the spec algorithms as stated don't let
you choose between implementation strategies that at first glance seem
equally valid but in fact will give different results. He gave some
specific examples - how to get prefix mappings in a DOM, how to
extract triples from an HTML document that would result in
reparenting, and whether prefix mappings should be assigned to
elements at parse time or extraction time if the DOM can be mutated
after parsing.
It seems like people reject his arguments for what superficially
appear to be mutually contradictory reasons: (a) that RDFa doesn't
really use Namespaces in XML, it just uses a syntax that looks the
same but could have been anything; (b) that RDFa normatively
references Namespaces in XML for implementation requirements; (c) that
RDFa is defined purely at the raw source text level (even though the
spec's processing rules speak of an abstract tree model); (d) that
RDFa can be applied directly to situations where original source text
is not available or may not even exist.
I'm pretty puzzled by the argument that RDFa is defined in terms of
raw source text. The start of section 5 or XHTML+RDFa says:
"Processing need not follow the DOM traversal technique outlined here,
although the effect of following some other manner of processing must
be the same as if the processing outlined here were followed. The
processing model is explained using the idea of DOM traversal which
makes it easier to describe (particularly in relation to the
[evaluation context])."
And indeed Section 5 describes processing in terms of DOM concepts
such as "document object", "child element", "document order" and so
forth. Later Section 5.5 describes its algorithm as "the DOM traversal
technique defined here".
It seems to me like it would be much more fruitful to go with this DOM-
like formalism instead of pretending that things are actually defined
at the textual level. They are not - nowhere does RDFa describe how to
get from source characters to its tree model for processing, that is
all left up to other specs (and with the understanding that
implementations may do things without a tree, as long as they give
equivalent results).
Buying into the DOM-based model that XHTML+RDFa already uses for its
processing rules would immediately answer many of Jonas's questions:
- HTML5+RDFa should be processed by taking the DOM that results from
the HTML5 parsing algorithm. As with XHTML+RDFa, you don't have to
literally create a DOM, but your output must be equivalent to the
processing defined in DOM terms.
- DOM mutations that happen before RDFa extraction *do* potentially
affect the extracted triples.
- HTML source documents that are parsed in a way that reparents nodes.
- There is no need to first serialize a DOM in order to process it
according to RDFa.
The only detail that would have to be filled in, if we accept the DOM-
based model that the spec already uses, is how to find the prefix
mappings. Either an XHTML+RDFa erratum or HTML5+RDFa could specify
that any attribute with a qualified name (tagName) that starts with
"xmlns:" creates a prefix mapping.
Buying into the DOM approach would also address Henri's objection
about bad spec layering.
Regards,
Maciej