Re: HTML 4 Profile for RDFa

Julian Reschke Sat, 23 May 2009 04:18:53 -0700

Philip Taylor wrote:

...
Indeed, it would be good have this defined with the level of precisionthat HTML 5 has, so we can be sure implementations will be able to agreeon how to extract RDFa from text/html content.
A few significant issues that I see in the current version:
What is "the @xml:lang attribute"? Is it the attribute with local name


It's unambiguous as long as we talk about a stream of characters, right?

"xml:lang" in no namespace (as would be produced by an HTML 5 parser(and by current HTML browser parser implementations))? or the attributewith local name "lang" in the namespace"http://www.w3.org/XML/1998/namespace"; (as would be produced by an XMLparser, and could be inserted in an HTML document via DOM APIs)? or both(in which case both could be specified on one element, in addition to"lang" in no namespace)?

Both can only be specified in the DOM, but not in a serialization (or amI missing something?).

That being said, I wouldn't hurt to have a section that defines specialaspects of processing RDFa from a DOM instead of a HTML document (as aseries of bytes/characters).

"If the object of a triple would be an XMLLiteral, and the input to theprocessor is not well-formed [XML]" - I don't understand what that meansin an HTML context. Is it meant to mean something like "the bytes in theHTML file that correspond to the contents of the relevant element couldbe parsed as well-formed XML (modulo various namespace declarationissues)"? If so, that seems impossible to implement. The input to theRDFa processor will most likely be a DOM, possibly manipulated by theDOM APIs rather than coming straight from an HTML parser, so it maynever have had a byte representation at all.
Even without scripting, there isn't always a contiguous sequence ofbytes corresponding to the content of an element. E.g. if the HTML inputis:
  <table>
    <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral>
      <td> This text goes inside the table </td>
      This text gets parsed to *outside* the table
      <td> This text goes inside the table </td>
    </tr>
  </table>
then (according to the HTML 5 parsing algorithm, and implemented in (atleast) Firefox) the content of the <tr> element includes the first andthird lines of text, but not the second. How would you decide whetherthe content is well-formed XML?


Is it still underspecified once we require a valid HTML5 document as input?

For this to make sense in real HTML implementations, the definitionshould be in terms of the document layer rather than the byte layer.

Disagreed. Many implementations never build a DOM. We're not onlytalking about browsers here.

...

How are xmlns:* attributes meant to be processed? E.g. what is theexpected output in the following cases:


<div xmlns:T="test:">
  <span typeof="t:x" property="t:y">Test</span>
</div>

<div XMLNS:t="test:">
  <span typeof="t:x" property="t:y">Test</span>
</div>

<div xmlns:T="test:">
  <span typeof="T:x" property="T:y">Test</span>
</div>

<div xmlns:t="test:">
  <div xmlns:t="">
    <span typeof="t:x" property="t:y">Test</span>
  </div>
</div>


I would expect the results to be the same for XHTML and HTML serializations.

<div xmlns:t="test1:" id="d">
  <span typeof="t:x" property="t:y">Test</span>
</div>
<script>
  document.getElementById('d').setAttributeNS(
    'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:');
    /* (now the element has two distinct attributes,
       each in different namespaces) */
</script>

That example illustrates why it's dangerous to focus too much onprocessing in the DOM. Many RDFa processors will never execute thescript. So I think considerations like the one above should be treatedas a distinct problem (potentially in an appendix of the spec).

...


BR, Julian

Re: HTML 4 Profile for RDFa

Reply via email to