Re: RDFa in HTML 5

Shelley Powers Fri, 22 May 2009 06:48:04 -0700

Philip Taylor wrote:

Seeing as people are implementing RDFa parsers for text/html, I guessit would be good to have a specification that says how they should work.
http://www3.aptest.com/standards/rdfa-html/ doesn't answer thequestions I'd want answered (e.g. inhttp://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0102.html),and HTML 4 seems to make it impossible to express an answer. Someexisting RDFa-in-text/html parsers are based on document models thatclosely match the DOM-like model used by HTML 5 (e.g. browser-based JSimplementations, and some Python ones using an html5lib DOM, and maybeothers), and the model used by HTML 5 can be implemented in a varietyof other ways (e.g. unbuffered SAX) so it's not too restrictive, andso it seems like the most useful way to define RDFa-in-text/htmlprocessing.
I've not seen anyone else working on this, so I started writing arough draft at <http://philip.html5.org/docs/rdfa/>. Some of it iscopied from the RDFa-in-XHTML specification, and just tweaked to usesome new definitions and to share concepts (like base and lang) withHTML 5 and to cope with text/html parsing (for xmlns:* attributes).The CURIE definitions are new, since I didn't see any existingdocument that defined them in an appropriate way.
There are several unresolved design issues (e.g. handling ofcase-sensitivity, use of xmlns:* vs other mechanisms that cause fewerproblems, etc) - I haven't intended to make any decisions on suchissues, I've just attempted to define the behaviour with sufficientdetail that it should make those issues visible.
The current draft is far from complete or correct, but it showsroughly the way I'd like to have things defined (and I hope it'sroughly the way that HTML5/WHATWG people would like it to be defined,in order to support implementers and to be testable), and maybe itcould end up being useful for something, so I'm just throwing it outhere for discussion.

Philip and I started an email exchange because of some postings onTwitter. I wanted to replicate the discussion here, with Philip'spermission. Some is unimportant, but I wanted to preserve context. Notethat these are from my perspective, so quoted material is from Philip,none quoted is mine.


First email from Philip and my reply:

Philip Taylor wrote:

I saw some discussion on Twitter, so just to clarify what thesituation is (as far as I'm aware of it):
I wrote the draft without having talked about it to anybody at all,because I thought (and still think) it might lead to something useful,and it seemed easier to just write something concrete rather thandiscuss it first. I posted about it to public-html andpublic-rdf-in-xhtml-tf, since that seems the easiest way to contactpeople who might be interested. A few people from the RDF side repliedprivately, including Manu (expressing a desire to discuss thingsfurther). Sam replied in public. That's about all there is.
Re "My input was not sought"/"This wasn't a party I was invited to" -I haven't sought input from anybody (except the public-* lists). Ifthis triggered some internal conversation in the RDFa world that youwere excluded from, I know nothing about it. If I continue working onthis, I'd be happy to hear technical comments about the content fromanywhere.
Re "a better chance of getting RDFa into HTML5" - that's not my aim atall; I'm not currently convinced that RDFa is a good solution thatought to be part of the language. But that's largely irrelevant - ifpeople are going to use it anyway (which it looks like they are, atleast to some extent) then I'd prefer it to be specified based onHTML5 rather than on XHTML1.1/HTML4, so that it's easier to implementcorrectly and so that it doesn't conflict with HTML5's requirements,and I'm not aware that anyone else is planning to specify it that way(but I'd be happy if someone else did so).
I don't care much about the politics of where the text ends up - itjust seems easier to do it as a separate document, effectivelydefining a new "HTML5+RDFa" language rather than modifying theoriginal HTML5 language definition, which achieves the goal of makingsure the precise behaviour of RDFa-in-text/html is actually specifiedsomewhere (regardless of whether it's a part of HTML5 or not).

Sam specifically mentioned me working with you. I checked with the RDFafolks, and they'd already initiated discussions with you.

Sam asked about Manu, Ben et al, and my answer was for him to ask. Myfurther response was that discussions are, or will be, underway, but Iam not part of the effort, and I'm the wrong person to ask.

I agree with you in a way that this shouldn't be 'part' of HTML5.Neither should any of the predefined vocabularies, or microdata, either.The only reason they are, is because HTML5 is not extensible.

The confused concept of "validation" associated with HTML5, though,makes it important to at least reference RDFa in such a way that a)attributes are not redefined and b) people know how to use RDFa in a"conforming" manner with HTML5 -- based on the condition that peoplecan't use one version of annotation for RDFa for XHTML 1.1, and anotherfor HTML5. The whole @prefix thing was foolish. Sorry, but that's myopinion.

So a document as an addendum, or complementary proposal issued by someorganization that describes how RDFa works with HTML5 (without impactingon how it works with HTML4, or XHTML), is good. It allows people to useRDFa with HTML5, without adverse impact on the underlying RDF model, andwithout requiring changes in behavior or syntax from what currentlyworks with XHTML (including XHTML5). And it sounds like you're going tobe working with the RDFa folks moving forward on this. That's what Imeant by "RDFa into HTML5". And I hope you all succeed.

I don't have a part in this, and that's cool. I'll continue to do my ownthing, which is primarily writing in my own space.

You know, the biggest problem with all of this is that you haveprocessing people and you have data people, but you don't necessarilyhave a lot of people who understand both worlds.


Anyway, good luck with your efforts.

---

A second email I sent based on Philip's original email:

PS I will say one thing, and I'm parroting Henri in this regard, to me aconforming implementation of RDFa in HTML5 is not necessarily one thatonly meets what's required for HTML5 -- it has to meet a conformancerequirement for RDF, too. How would we know if the document isconforming? Because the same annotation in a document served up asXHTML5, should generate the exact same RDF graph, as would be generatedif the document is served up as HTML5. To ensure this, how theannotation is interpreted from a data perspective must be defined in asingle document, such as RDFa-in-XHTML.

If you have two separate documents providing rules about how triples areto be formed based on the same annotation, you have a failed system. Youwould be better off just ignoring RDFa and let folks generate"non-conforming HTML5" documents, with foreign annotation. At leastthen, RDFa extrators would have only one set of rules to apply when itcomes to building the underlying RDF graph.

The reason why Shane's document is "sparse" on parsing (processing)information (according to the WhatWG IRC entries) is that Shane wasdeferring the RDFa processor conformance to the RDFa-XHTML syntax andprocessing document. This was right and proper. He was using goodtechnique.

If you cross over the boundaries that define the markup specificationfrom other specifications, you leave the potential for conflictingconformance requirements. An example is the color section in the HTML5document. What if how colors are defined is changed in CSS? Well, then,you'd have to two sets of differing conformance requirements. I stillcan't figure out why there's a section on processing color values inHTML, when there shouldn't even color values within the HTML markup,directly. Legacy, I suppose.

Philip, you specify the attributes, which is good, because that ensuresthey're reserved, and Ian doesn't do something like @property again.Working through issues of existing shared attributes is also a goodness.

Then you copy the RDFaSyntax document bits, and redefine them into HTML5speak, which opens the door for conflicting conformance requirements,and worse, differing underlying RDF graphs. I can understand notingwhere specific terms in the RDFaSyntax document map to other terms inthe HTML5 document, but providing a separate processing model...

I have to assume this was to generate a dialog, not based on actuallydelivering the document in this way -- with a "separate" processingmodel section.

There's my initial notes. I'd put it into the email lists, but frankly,I'm tired of everything I write or say being joked over on the WhatWG IRC.

---

Some of the correspondence was irrelevant to this group. I'm onlyduplicating it to be consistently public. Philip's follow up reply andmine are much more relevant to a larger discussion. In my opinion at least:

First, clarification: when I respond, I'm responding only for myself,not the RDF/RDFa folks.

The problem in that document is it doesn't define how to map from thesyntax onto the RDFa-in-XHTML processing model, which leaves a gapwhere the behaviour is undefined. E.g. I can write <div xmlns:="...">in HTML, and I don't know whether that attribute should be ignored orshould redefine the default prefix mapping, because it's impossible inXHTML and so the RDFa-in-XHTML specification doesn't explain how tohandle it.

But you don't have to re-specify a section to explain gaps. Or you don'thave to re-state those sections with which you're in agreement.

The RDFa document, itself, falls back on certain processing rules --defined both in XHTML, and indirectly, in XML. I don't think there's anyconflict by specifying in the RDFa in HTML5 document that where suchrules exist implicitly in the RDFa in XHTML document, they're explicitlygiven in the HTML5 document.

One idea for fixing the gap is to produce a more detailed mapping fromtext/html onto the RDFa-in-XHTML processing model. But that seems likean unpleasantly difficult solution, since RDFa-in-XHTML wasn't reallydesigned to be used like that and there lots of small mismatches andedge cases that make it tricky.

But if you create a _new_ processing model, there will eventually be twoset of rules to follow, which introduces corruption in the underlyingdata models (RDF graphs).

You keep talking about processing the data _within_ the document usingJS, and I'm trying to make a point that the majority of RDF ends upmerged with other RDF from other documents in much larger pools of data.Personally I don't give a damn about processing RDF in my pages with JS.And I don't think I'm necessarily an exception. I can tell that most ofthe work being done with Drupal 7 is based on the data being consumedoutside the pages, rather than within.

So from a mindset perspective, we have to get away from this JS/Ajax,in-page view of the data and look at it from a broader perspective. Itwould be better not to have any data, than to have "bad" data.

I'm assuming you've worked with databases created by other entitieswhere you've not had control over the creation of the data modelunderlying the database, or the validation of the data going into thedatabase. If you've participated in any kind of a data clean upoperation, you must know that no data is all is actually easier tomanage, than not being able to tell what is good data, from "bad". Oncethat's happened, good and bad mixed, with no clear clue as to which iswhich, the database is completely corrupted, and has to be discarded.

Since HTML 5 already defines how to handle text/html andapplication/xhtml+xml in a common processing model, ...

Has it, though? I've looked through the document, and if you are talkingabout processing, how do we handle xmlns in HTML5 land? How do we dealwith <svg:svg in HTML5 land?

I really don't think the current HTML5 document really has dealt with a"common processing model" for both HTML5 and XHTML5. That's just myopinion, though.

I think redefining the RDFa processing model on top of the HTML 5processing model is possibly the best way to get well-defined,consistent behaviour between HTML and XHTML. So it would entirelyreplace the current RDFa-in-XHTML spec, ensuring there's only a singledocument telling people how to parse RDFa in both HTML and XHTML.Maybe it should be thought of as a new edition of the existing spec,rather than a totally new spec.

Again, I cannot agree. The microdata model generates RDF triples thatdon't map to what the supposed equivalent RDFa annotation would provide.Even with the new additions of rdf:type and about. I don't feel sanguinethat things would improve if the HTML5's document actually replaces theRDFa-in-XHTML spec -- in fact I think you better have a heart to heartwith Manu et al about that one, right away.

I admire the confidence of the WhatWG group, but I don't think that theway into the future of the web is to have every specification washedthrough the HTML5 group, just because that's the only way to _ensure_that it's "processed properly". Sometimes I come away from reading theWhatWG IRC absolutely astonished that the web we have today actuallyexists, because all of it is so darn crappy.

Regardless of what Manu, Ben, et al say, I feel confident in saying thatthe RDFa-in-XHTML spec is not going to be replaced by the HTML5 workinggroup. I believe that compromise and cooperate rather than replace is abetter way forward.

I guess there are lots of political/process issues with doing that,but it'd be nice to have a technically sound solution before gettingblocked by those issues.

Well, I think you have more than political issues going now. Google justtook RDFa and exploded it all over the place. This in addition to theother uses of RDFa that will be introduced in Drupal 7, and elsewhere.Uses that will probably incorporate more sophisticated uses of RDFa thanGoogle's use. RDFa, as documented in the RDFaSyntax document willcontinue to exist, regardless of what happens with HTML5. I believe itwould be in everyone's best interest to assume this is so.

Either we all come to some kind of agreement (with supportingdocumentation) to live and let live, or we just ignore each other, andgo on like we are now. Amicably, hopefully. One subsuming the other isnot going to happen.

But then, that's just my opinion. I'm not a member of the RDFa group,and can't speak for their opinions.

---

Sorry for the length of posting, typos, asides and so on. Hopefullythere might be something of interest to folks in the exchange.


Shelley

Re: RDFa in HTML 5

Reply via email to