Re: [whatwg] Annotating structured data that HTML has no semantics for
On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote: Page 3: h2My Catsh2 dl dtSchrouml;dinger dd item=com.damowmow.cat meta property=com.damowmow.name content=Schrouml;dinger meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. dtErwin dd item=com.damowmow.cat meta property=com.damowmow.name content=Lord Erwin meta property=com.damowmow.age content=3 p property=com.damowmow.descSiamese color-point. img property=com.damowmow.img alt= src=/images/erwin.jpeg /dl Given the microdata solution and this example, there is now a reason other than styling to introduce di, since here you duplicate the dt information in meta. dl di item=com.damowmow.cat dt property=com.damowmow.nameSchrouml;dinger dd meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. /di ... The styling problem is discussed at http://forums.whatwg.org/viewtopic.php?t=47 -- Simon Pieters Opera Software
Re: [whatwg] innerStaticHTML
On Wed, May 6, 2009 at 9:40 AM, João Eiras jo...@opera.com wrote: The suggestion of marking content as non-executable doesn't solve anything, because after setting innerStaticHTML another script might serialize a piece of the affected DOM to string and back to a tree, and the code could then execute, which would not be wanted. Yes, we can't make it impossible for web developers to shoot themselves in the foot. We also can't stop them from calling eval on a query string argument. However, innerStaticHTML does make it easier to display untrusted HTML to the user. The only viable solution, from my point of view, would be for the UA to parse the string, and remove all untrusted content from the result tree before appending to the document. This is what I meant to suggest. That would mean removing all onevent attributes, all scripts elements, all plugins, etc. Basically, letting the UA implement all the filtering. Exactly. As you say, the UA is in a much better position to do this correctly than an individual web site. On Thu, May 7, 2009 at 3:24 AM, Kristof Zelechovski giecr...@stegny.2a.pl wrote: If toStaticHTML prunes everything it is not sure of, the danger of a known language construct suddenly introducing active content is negligible. I am sure HTML5 specification editors bear that aspect in mind and so shall they in the future. Even if you believe that we've already committed to not introducing active content that breaks toStaticHTML (which I'm not convinced we have, especially because I don't know what algorithm it uses), that still leaves the performance and correctness issues of parsing the untrusted content twice. Parsing the content once is more efficient and more predictable. Adam
Re: [whatwg] Annotating structured data that HTML has no semantics for
Ian Hickson: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. (..) SCENARIOS: Between the scenarios should be considered also this case: * a user (or groups of users) wants to annotate items present on a generic web page with additional properties in a certain vocabulary. for example Joe wants to gather in a blog a series of personal annotation to movies (or other type of items) present in imdb.com. other examples of external annotation could be derived from this document [1]. this option require that @subject accept: 1) ID of an element with an item attribute, in the same Document or 2) valid URL of an element with an item attribute elsewhere in the web or 3) a valid URL (ithe item is the referred document or fragment) This raises two other questions: a) In the case of properties specified for element without ancestor with an item attribute specified the corresponding item should be the document? (element body with implicit item attribute). b) Do we need to require UA to offer a standard way to visualize (at least as an option left to the user) the structured information carried in microdata ? And copypaste? See also this email [2]. [1] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01 [2] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html -- Giovanni Gentili
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Mon, May 11, 2009 at 6:15 PM, Giovanni Gentili giovanni.gent...@gmail.com wrote: * a user (or groups of users) wants to annotate items present on a generic web page with additional properties in a certain vocabulary. for example Joe wants to gather in a blog a series of personal annotation to movies (or other type of items) present in imdb.com. [...] this option require that @subject accept: 1) ID of an element with an item attribute, in the same Document or 2) valid URL of an element with an item attribute elsewhere in the web or 3) a valid URL (ithe item is the referred document or fragment) For the RDF output, you can use link property=about href=http://subject/; to create triples whose subject is a URL. (I believe in general you can also do: meta item id=n0 link subject=n0 property=about href=http://subject/; link subject=n0 property=http://predicate1/; href=http://object1/; meta subject=n0 property=http://predicate2/; content=object2 to represent arbitrary RDF triples.) I don't think it would make sense for @subject to be a URL when generating JSON output, because there wouldn't be anywhere to represent that URL in the output structure. But there could be a convention that properties called about indicate the URLs that the item applies to, and then it would work with exactly the same markup as the RDF case. -- Philip Taylor exc...@gmail.com
Re: [whatwg] innerStaticHTML
On Tue, May 12, 2009 at 4:16 AM, Adam Barth wha...@adambarth.com wrote: On Thu, May 7, 2009 at 3:24 AM, Kristof Zelechovski giecr...@stegny.2a.pl wrote: If toStaticHTML prunes everything it is not sure of, the danger of a known language construct suddenly introducing active content is negligible. I am sure HTML5 specification editors bear that aspect in mind and so shall they in the future. Even if you believe that we've already committed to not introducing active content that breaks toStaticHTML (which I'm not convinced we have, especially because I don't know what algorithm it uses) I would be shocked if we have committed to not introducing active content that breaks IE8's toStaticHTML. That would be terribly limiting. (Does it prune the video and audio event attributes?) When you call innerStaticHTML it should prune everything that's unsafe for *this UA*. Authors should not send that content to other UAs and expect it to be safe for those UAs. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Custom microdata handling added to HTML5 spec
On Sun, 10 May 2009, Manu Sporny wrote: Shelley Powers wrote: Since a new section detailing HTML5's handling of custom microdata has been added to the HTML5 spec http://dev.w3.org/html5/spec/Overview.html#microdata I've only had a brief chance to look over the HTML5 Microdata spec, but there is one big problem that overrides all of the other issues: The HTML5 Microdata spec is in direct conflict with planned RDFa extensions and will almost surely result in spurious triples being generated in RDFa processors in the future. I've renamed property= to itemprop=. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerStaticHTML
On 06.05.2009, at 17:31, Adam Barth wrote: WHY NOT toStaticHTML? toStaticHTML addresses the same use cause by translating an untrusted string to another string that lacks active HTML content. This API has two issues: 1) The untrusted string - static string - HTML parser workflow requires the browser to parse the string twice, introducing a performance penalty and a security issue if the two parsing aren't identical. That is based on assumptions that: 1. parsing is expensive enough to warrant API optimized for this particular case 2. browsers cannot optimize it otherwise 3. returned code will be ambiguous In client-side scripts untrusted content comes from the network, which means that parsing time is going to be miniscule compared to time required to fetch the content (and to render it). My guess is that parsing itself is not a bottleneck. Second, it _is_ possible to avoid reparsing without special API for this. toStaticHTML() may return subclass of String that contains reference to parsed DOM. Roughly something like this: function toStaticHTML(html) { var cleanDOM = clean(parse(html)) return { toString:function(){return unparse(cleanDOM)}, node:cleanDOM } } which should make common case: innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html; toStaticHTML() enables other optimisations, e.g. filtered HTML can be saved for future use (in local storage) or string filtered once used in multiple places. Alternatively there could be toStaticDOM() method that returns DOMDocumentFragment, avoiding reparsing issue entirely. 2) The API is difficult to future-proof because future versions of HTML are likely to add new tags with active content (e.g., like the video tag's event handlers). When support for new tag is added to a browser, it would also be added to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't be a problem either way. Browser doesn't need to worry about dangerous constructs it does not support. Methods are easier to patch than properties in JavaScript, so if implementation of existing toStaticHTML() turned out to be insecure, the method could be easily replaced/patched on cilent-side, or applications could post-process output of toStaticHTML(). It's not that easy with a property. I dislike APIs based on magic properties. Properties cannot take arguments and we'd have to create new property for every combination of arguments. If innerHTML was a method, instead of creating new property we could extend it to be innerHTML(html, static=true). If more sophisticated filtering becomes needed in the future, we could have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}), but it would be silly to create another innerStaticHTMLwithSVGandRDFbutWithoutMarquee property. -- regards, Kornel
[whatwg] Expandos and Prototyping
Are expando / prototype functions at all included in the HTML 5 specs? While we may all know what Object.prototype does, I'd like to see its use added to Section 6: Web browsers. The Prototype Expando is not necessarily a Javascript-only construct, and neither is HTML 5. While I'm not championing full prototype inheritance, I do wonder (out-loud), whether some small section of HTML 5 might be describe the most basic of prototyping and expandos: Many projects use ellipse or other shapes for example, but this is easier: CanvasRenderingContext2D.prototype.funcName = function() { alert(Fill+this.fillStyle); } document.createElement('canvas').getContext('2d').funcName(); I've never seen any developers attempt to use multiple inheritance within the CanvasRenderingContext2D object, nor have I tested myself to see if Firefox (the champion of such schemes) supports it. Which is why I'd be more than satisfied simply requiring single inheritance. It's already available in all implementations, and we spent a good deal of time making it available in our own. Expando Prototype would need descriptions of: expandos, prototyped objects, for(... in ...) All modern browsers support prototype, and so do many languages (without writing libraries). We've confirmed that it expandos and prototypes work just fine in Active X, MS long ago created IDispatchEx. Any host language with getter / setter availability can implement prototyping and expandos on an object, at least of one depth. I'd like to see .prototype described in the scripting section. That said, I'm more hesitant to champion .constructor and .__proto__. -Charles
Re: [whatwg] Expandos and Prototyping
On Mon, 11 May 2009, Charles Pritchard wrote: Are expando / prototype functions at all included in the HTML 5 specs? Yes. Specifically, HTML5 uses WebIDL for all its definitions of interfaces, objects, etc, and the WebIDL spec defines how prototypes, custom properties, etc, work: http://dev.w3.org/2006/webapi/WebIDL/ -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Annotating structured data that HTML has no semantics for
A cursory glance on the new section 5 raises two questions on indirection: (Note the metas in the last example -- since sometimes the information isn't visible, rather than requiring that people put it in and hide it with display:none, which has a rather poor accessibility story, I figured we could just allow meta anywhere, if it has a property= attribute.) That seems to be a solution optimised for extremely invisible metadata but not for metadata which differs from the human visible data. Imagine as an example the simple act of marking up a number (and ignoring what the number denotes). For human consumption a thousands seperator is often used, the type of seperator differs by language, locale and context. Just in my little word I see on regular basis the point, the comma, the space, the thin space and sometimes the the apostrophe. Parsing different representations of numbers would be a chore. The value of textContent of the element span itemprop=com.example.price€nbsp;1thinsp;000thinsp;000,mdash;/ span is clearly unusable, demanding an additional invisible meta property=com.example.price content=100. My irritation lies in the element proliferation, requiring one element/ attribute combination for machines, one element/text content combination for humans. Of course, any sane author would arrange both elements in a close relation, as parent/child or sibling but there would be still two different elements to maintain, leading to a higher cognitive load. Not just for authors but also for programmers: a fluctating price had to be actualized on two different elements; tree walking DOM scripts had to take meta-Elements in account. Furthermore it clashes with the familiar habit of other elements in HTML. A hyperlink is one element with a machine-readable attribute and human- readable text content. A citation is one element with a machine- readable reference and human-readable text content. The same model is used in meter, progress, time, abbr ... but not in user- defined objects. I'd prefer an additional @content-like attribute which supersedes the text content and maybe even the default values of the other value-bearing elements, reducing two different elements to maintain or change to just one. Instead, let us try using the regular IDREF functionality that HTML uses in a variety of other places, like label for=. For this we'll need a new attribute, but unfortunately we can't use about= (which would be the obvious name to use), because that would conflict with RDFa, so instead we'll use subject=: I'm slighty irritated by the implied change from active, possessive formulating (“The cat has the name Hedral.”) to something more passive- y (“Hedral is a name owned by that cat.“). My mental model for property relationships orients itself more on the former wording; link relationships are similar in that regard. @about/@subject are like @rev; a @resource alias @rel would feel more natural. There are practical relation by the missing @resource, I think. Imagine a document documenting an household and a household vocabulary which allows triples of humans which are in an owner relationship to a cat. Given an household of two humans and one cat; how does one markup the assumption that the cat has two owners?