Philip Taylor wrote:
Seeing as people are implementing RDFa parsers for text/html, I guess
it would be good to have a specification that says how they should work.
http://www3.aptest.com/standards/rdfa-html/ doesn't answer the
questions I'd want answered (e.g. in
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0102.html),
and HTML 4 seems to make it impossible to express an answer. Some
existing RDFa-in-text/html parsers are based on document models that
closely match the DOM-like model used by HTML 5 (e.g. browser-based JS
implementations, and some Python ones using an html5lib DOM, and maybe
others), and the model used by HTML 5 can be implemented in a variety
of other ways (e.g. unbuffered SAX) so it's not too restrictive, and
so it seems like the most useful way to define RDFa-in-text/html
processing.
I've not seen anyone else working on this, so I started writing a
rough draft at <http://philip.html5.org/docs/rdfa/>. Some of it is
copied from the RDFa-in-XHTML specification, and just tweaked to use
some new definitions and to share concepts (like base and lang) with
HTML 5 and to cope with text/html parsing (for xmlns:* attributes).
The CURIE definitions are new, since I didn't see any existing
document that defined them in an appropriate way.
There are several unresolved design issues (e.g. handling of
case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer
problems, etc) - I haven't intended to make any decisions on such
issues, I've just attempted to define the behaviour with sufficient
detail that it should make those issues visible.
The current draft is far from complete or correct, but it shows
roughly the way I'd like to have things defined (and I hope it's
roughly the way that HTML5/WHATWG people would like it to be defined,
in order to support implementers and to be testable), and maybe it
could end up being useful for something, so I'm just throwing it out
here for discussion.
Philip and I started an email exchange because of some postings on
Twitter. I wanted to replicate the discussion here, with Philip's
permission. Some is unimportant, but I wanted to preserve context. Note
that these are from my perspective, so quoted material is from Philip,
none quoted is mine.
First email from Philip and my reply:
Philip Taylor wrote:
I saw some discussion on Twitter, so just to clarify what the
situation is (as far as I'm aware of it):
I wrote the draft without having talked about it to anybody at all,
because I thought (and still think) it might lead to something useful,
and it seemed easier to just write something concrete rather than
discuss it first. I posted about it to public-html and
public-rdf-in-xhtml-tf, since that seems the easiest way to contact
people who might be interested. A few people from the RDF side replied
privately, including Manu (expressing a desire to discuss things
further). Sam replied in public. That's about all there is.
Re "My input was not sought"/"This wasn't a party I was invited to" -
I haven't sought input from anybody (except the public-* lists). If
this triggered some internal conversation in the RDFa world that you
were excluded from, I know nothing about it. If I continue working on
this, I'd be happy to hear technical comments about the content from
anywhere.
Re "a better chance of getting RDFa into HTML5" - that's not my aim at
all; I'm not currently convinced that RDFa is a good solution that
ought to be part of the language. But that's largely irrelevant - if
people are going to use it anyway (which it looks like they are, at
least to some extent) then I'd prefer it to be specified based on
HTML5 rather than on XHTML1.1/HTML4, so that it's easier to implement
correctly and so that it doesn't conflict with HTML5's requirements,
and I'm not aware that anyone else is planning to specify it that way
(but I'd be happy if someone else did so).
I don't care much about the politics of where the text ends up - it
just seems easier to do it as a separate document, effectively
defining a new "HTML5+RDFa" language rather than modifying the
original HTML5 language definition, which achieves the goal of making
sure the precise behaviour of RDFa-in-text/html is actually specified
somewhere (regardless of whether it's a part of HTML5 or not).
Sam specifically mentioned me working with you. I checked with the RDFa
folks, and they'd already initiated discussions with you.
Sam asked about Manu, Ben et al, and my answer was for him to ask. My
further response was that discussions are, or will be, underway, but I
am not part of the effort, and I'm the wrong person to ask.
I agree with you in a way that this shouldn't be 'part' of HTML5.
Neither should any of the predefined vocabularies, or microdata, either.
The only reason they are, is because HTML5 is not extensible.
The confused concept of "validation" associated with HTML5, though,
makes it important to at least reference RDFa in such a way that a)
attributes are not redefined and b) people know how to use RDFa in a
"conforming" manner with HTML5 -- based on the condition that people
can't use one version of annotation for RDFa for XHTML 1.1, and another
for HTML5. The whole @prefix thing was foolish. Sorry, but that's my
opinion.
So a document as an addendum, or complementary proposal issued by some
organization that describes how RDFa works with HTML5 (without impacting
on how it works with HTML4, or XHTML), is good. It allows people to use
RDFa with HTML5, without adverse impact on the underlying RDF model, and
without requiring changes in behavior or syntax from what currently
works with XHTML (including XHTML5). And it sounds like you're going to
be working with the RDFa folks moving forward on this. That's what I
meant by "RDFa into HTML5". And I hope you all succeed.
I don't have a part in this, and that's cool. I'll continue to do my own
thing, which is primarily writing in my own space.
You know, the biggest problem with all of this is that you have
processing people and you have data people, but you don't necessarily
have a lot of people who understand both worlds.
Anyway, good luck with your efforts.
---
A second email I sent based on Philip's original email:
PS I will say one thing, and I'm parroting Henri in this regard, to me a
conforming implementation of RDFa in HTML5 is not necessarily one that
only meets what's required for HTML5 -- it has to meet a conformance
requirement for RDF, too. How would we know if the document is
conforming? Because the same annotation in a document served up as
XHTML5, should generate the exact same RDF graph, as would be generated
if the document is served up as HTML5. To ensure this, how the
annotation is interpreted from a data perspective must be defined in a
single document, such as RDFa-in-XHTML.
If you have two separate documents providing rules about how triples are
to be formed based on the same annotation, you have a failed system. You
would be better off just ignoring RDFa and let folks generate
"non-conforming HTML5" documents, with foreign annotation. At least
then, RDFa extrators would have only one set of rules to apply when it
comes to building the underlying RDF graph.
The reason why Shane's document is "sparse" on parsing (processing)
information (according to the WhatWG IRC entries) is that Shane was
deferring the RDFa processor conformance to the RDFa-XHTML syntax and
processing document. This was right and proper. He was using good
technique.
If you cross over the boundaries that define the markup specification
from other specifications, you leave the potential for conflicting
conformance requirements. An example is the color section in the HTML5
document. What if how colors are defined is changed in CSS? Well, then,
you'd have to two sets of differing conformance requirements. I still
can't figure out why there's a section on processing color values in
HTML, when there shouldn't even color values within the HTML markup,
directly. Legacy, I suppose.
Philip, you specify the attributes, which is good, because that ensures
they're reserved, and Ian doesn't do something like @property again.
Working through issues of existing shared attributes is also a goodness.
Then you copy the RDFaSyntax document bits, and redefine them into HTML5
speak, which opens the door for conflicting conformance requirements,
and worse, differing underlying RDF graphs. I can understand noting
where specific terms in the RDFaSyntax document map to other terms in
the HTML5 document, but providing a separate processing model...
I have to assume this was to generate a dialog, not based on actually
delivering the document in this way -- with a "separate" processing
model section.
There's my initial notes. I'd put it into the email lists, but frankly,
I'm tired of everything I write or say being joked over on the WhatWG IRC.
---
Some of the correspondence was irrelevant to this group. I'm only
duplicating it to be consistently public. Philip's follow up reply and
mine are much more relevant to a larger discussion. In my opinion at least:
First, clarification: when I respond, I'm responding only for myself,
not the RDF/RDFa folks.
The problem in that document is it doesn't define how to map from the
syntax onto the RDFa-in-XHTML processing model, which leaves a gap
where the behaviour is undefined. E.g. I can write <div xmlns:="...">
in HTML, and I don't know whether that attribute should be ignored or
should redefine the default prefix mapping, because it's impossible in
XHTML and so the RDFa-in-XHTML specification doesn't explain how to
handle it.
But you don't have to re-specify a section to explain gaps. Or you don't
have to re-state those sections with which you're in agreement.
The RDFa document, itself, falls back on certain processing rules --
defined both in XHTML, and indirectly, in XML. I don't think there's any
conflict by specifying in the RDFa in HTML5 document that where such
rules exist implicitly in the RDFa in XHTML document, they're explicitly
given in the HTML5 document.
One idea for fixing the gap is to produce a more detailed mapping from
text/html onto the RDFa-in-XHTML processing model. But that seems like
an unpleasantly difficult solution, since RDFa-in-XHTML wasn't really
designed to be used like that and there lots of small mismatches and
edge cases that make it tricky.
But if you create a _new_ processing model, there will eventually be two
set of rules to follow, which introduces corruption in the underlying
data models (RDF graphs).
You keep talking about processing the data _within_ the document using
JS, and I'm trying to make a point that the majority of RDF ends up
merged with other RDF from other documents in much larger pools of data.
Personally I don't give a damn about processing RDF in my pages with JS.
And I don't think I'm necessarily an exception. I can tell that most of
the work being done with Drupal 7 is based on the data being consumed
outside the pages, rather than within.
So from a mindset perspective, we have to get away from this JS/Ajax,
in-page view of the data and look at it from a broader perspective. It
would be better not to have any data, than to have "bad" data.
I'm assuming you've worked with databases created by other entities
where you've not had control over the creation of the data model
underlying the database, or the validation of the data going into the
database. If you've participated in any kind of a data clean up
operation, you must know that no data is all is actually easier to
manage, than not being able to tell what is good data, from "bad". Once
that's happened, good and bad mixed, with no clear clue as to which is
which, the database is completely corrupted, and has to be discarded.
Since HTML 5 already defines how to handle text/html and
application/xhtml+xml in a common processing model, ...
Has it, though? I've looked through the document, and if you are talking
about processing, how do we handle xmlns in HTML5 land? How do we deal
with <svg:svg in HTML5 land?
I really don't think the current HTML5 document really has dealt with a
"common processing model" for both HTML5 and XHTML5. That's just my
opinion, though.
I think redefining the RDFa processing model on top of the HTML 5
processing model is possibly the best way to get well-defined,
consistent behaviour between HTML and XHTML. So it would entirely
replace the current RDFa-in-XHTML spec, ensuring there's only a single
document telling people how to parse RDFa in both HTML and XHTML.
Maybe it should be thought of as a new edition of the existing spec,
rather than a totally new spec.
Again, I cannot agree. The microdata model generates RDF triples that
don't map to what the supposed equivalent RDFa annotation would provide.
Even with the new additions of rdf:type and about. I don't feel sanguine
that things would improve if the HTML5's document actually replaces the
RDFa-in-XHTML spec -- in fact I think you better have a heart to heart
with Manu et al about that one, right away.
I admire the confidence of the WhatWG group, but I don't think that the
way into the future of the web is to have every specification washed
through the HTML5 group, just because that's the only way to _ensure_
that it's "processed properly". Sometimes I come away from reading the
WhatWG IRC absolutely astonished that the web we have today actually
exists, because all of it is so darn crappy.
Regardless of what Manu, Ben, et al say, I feel confident in saying that
the RDFa-in-XHTML spec is not going to be replaced by the HTML5 working
group. I believe that compromise and cooperate rather than replace is a
better way forward.
I guess there are lots of political/process issues with doing that,
but it'd be nice to have a technically sound solution before getting
blocked by those issues.
Well, I think you have more than political issues going now. Google just
took RDFa and exploded it all over the place. This in addition to the
other uses of RDFa that will be introduced in Drupal 7, and elsewhere.
Uses that will probably incorporate more sophisticated uses of RDFa than
Google's use. RDFa, as documented in the RDFaSyntax document will
continue to exist, regardless of what happens with HTML5. I believe it
would be in everyone's best interest to assume this is so.
Either we all come to some kind of agreement (with supporting
documentation) to live and let live, or we just ignore each other, and
go on like we are now. Amicably, hopefully. One subsuming the other is
not going to happen.
But then, that's just my opinion. I'm not a member of the RDFa group,
and can't speak for their opinions.
---
Sorry for the length of posting, typos, asides and so on. Hopefully
there might be something of interest to folks in the exchange.
Shelley