Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Simon Pieters

On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote:


   Page 3:
   h2My Catsh2
   dl
dtSchrouml;dinger
dd item=com.damowmow.cat
 meta property=com.damowmow.name content=Schrouml;dinger
 meta property=com.damowmow.age content=9
 p property=com.damowmow.descOrange male.
dtErwin
dd item=com.damowmow.cat
 meta property=com.damowmow.name content=Lord Erwin
 meta property=com.damowmow.age content=3
 p property=com.damowmow.descSiamese color-point.
 img property=com.damowmow.img alt= src=/images/erwin.jpeg
   /dl


Given the microdata solution and this example, there is now a reason other than styling to 
introduce di, since here you duplicate the dt information in meta.

  dl
   di item=com.damowmow.cat
dt property=com.damowmow.nameSchrouml;dinger
dd
 meta property=com.damowmow.age content=9
 p property=com.damowmow.descOrange male.
   /di
   ...


The styling problem is discussed at http://forums.whatwg.org/viewtopic.php?t=47

--
Simon Pieters
Opera Software




Re: [whatwg] innerStaticHTML

2009-05-11 Thread Adam Barth
On Wed, May 6, 2009 at 9:40 AM, João Eiras jo...@opera.com wrote:
 The suggestion of marking content as non-executable doesn't solve anything, 
 because after setting innerStaticHTML another script might serialize a piece 
 of the affected DOM to string and back to a tree, and the code could then 
 execute, which would not be wanted.

Yes, we can't make it impossible for web developers to shoot
themselves in the foot.  We also can't stop them from calling eval on
a query string argument.  However, innerStaticHTML does make it easier
to display untrusted HTML to the user.

 The only viable solution, from my point of view, would be for the UA to parse 
 the string, and remove all untrusted content from the result tree before 
 appending to the document.

This is what I meant to suggest.

 That would mean removing all onevent attributes, all scripts elements, all 
 plugins, etc. Basically, letting the UA implement all the filtering.

Exactly.  As you say, the UA is in a much better position to do this
correctly than an individual web site.

On Thu, May 7, 2009 at 3:24 AM, Kristof Zelechovski
giecr...@stegny.2a.pl wrote:
 If toStaticHTML prunes everything it is not sure of, the danger of a known
 language construct suddenly introducing active content is negligible.  I am
 sure HTML5 specification editors bear that aspect in mind and so shall they
 in the future.

Even if you believe that we've already committed to not introducing
active content that breaks toStaticHTML (which I'm not convinced we
have, especially because I don't know what algorithm it uses), that
still leaves the performance and correctness issues of parsing the
untrusted content twice.  Parsing the content once is more efficient
and more predictable.

Adam


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Giovanni Gentili
Ian Hickson:
   USE CASE: Annotate structured data that HTML has no semantics for, and
   which nobody has annotated before, and may never again, for private use or
   use in a small self-contained community.
 (..)
   SCENARIOS:

Between the scenarios should be considered also this case:

* a user (or groups of users) wants to annotate
items present on a generic web page with
additional properties in a certain vocabulary.
for example Joe wants to gather in a blog
a series of personal annotation to movies
(or other type of items) present in imdb.com.

other examples of external annotation could
be derived from this document [1].

this option require that @subject accept:

1) ID of an element with an item attribute, in the same Document
or
2) valid URL of an element with an item attribute elsewhere in the web
or
3) a valid URL (ithe item is the referred document or fragment)

This raises two other questions:

a) In the case of  properties specified for element
without ancestor with an item attribute specified
the corresponding item should be the document?
(element body with implicit item attribute).

b) Do we need to require UA to offer a standard
way to visualize (at least as an option left to the user)
the structured information carried in microdata ?
And copypaste? See also this email [2].

[1] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01
[2] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html

-- 
Giovanni Gentili


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Philip Taylor
On Mon, May 11, 2009 at 6:15 PM, Giovanni Gentili
giovanni.gent...@gmail.com wrote:
 * a user (or groups of users) wants to annotate
 items present on a generic web page with
 additional properties in a certain vocabulary.
 for example Joe wants to gather in a blog
 a series of personal annotation to movies
 (or other type of items) present in imdb.com.

 [...]

 this option require that @subject accept:

 1) ID of an element with an item attribute, in the same Document
 or
 2) valid URL of an element with an item attribute elsewhere in the web
 or
 3) a valid URL (ithe item is the referred document or fragment)

For the RDF output, you can use link property=about
href=http://subject/; to create triples whose subject is a URL. (I
believe in general you can also do:
  meta item id=n0
  link subject=n0 property=about href=http://subject/;
  link subject=n0 property=http://predicate1/; href=http://object1/;
  meta subject=n0 property=http://predicate2/; content=object2
to represent arbitrary RDF triples.)

I don't think it would make sense for @subject to be a URL when
generating JSON output, because there wouldn't be anywhere to
represent that URL in the output structure. But there could be a
convention that properties called about indicate the URLs that the
item applies to, and then it would work with exactly the same markup
as the RDF case.

-- 
Philip Taylor
exc...@gmail.com


Re: [whatwg] innerStaticHTML

2009-05-11 Thread Robert O'Callahan
On Tue, May 12, 2009 at 4:16 AM, Adam Barth wha...@adambarth.com wrote:

 On Thu, May 7, 2009 at 3:24 AM, Kristof Zelechovski
 giecr...@stegny.2a.pl wrote:
  If toStaticHTML prunes everything it is not sure of, the danger of a
 known
  language construct suddenly introducing active content is negligible.  I
 am
  sure HTML5 specification editors bear that aspect in mind and so shall
 they
  in the future.

 Even if you believe that we've already committed to not introducing
 active content that breaks toStaticHTML (which I'm not convinced we
 have, especially because I don't know what algorithm it uses)


I would be shocked if we have committed to not introducing active content
that breaks IE8's toStaticHTML. That would be terribly limiting. (Does it
prune the video and audio event attributes?)

When you call innerStaticHTML it should prune everything that's unsafe for
*this UA*. Authors should not send that content to other UAs and expect it
to be safe for those UAs.

Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Custom microdata handling added to HTML5 spec

2009-05-11 Thread Ian Hickson
On Sun, 10 May 2009, Manu Sporny wrote:
 Shelley Powers wrote:
  Since a new section detailing HTML5's handling of custom microdata  has
  been added to the HTML5 spec
  
  http://dev.w3.org/html5/spec/Overview.html#microdata
 
 I've only had a brief chance to look over the HTML5 Microdata spec, but
 there is one big problem that overrides all of the other issues: The
 HTML5 Microdata spec is in direct conflict with planned RDFa extensions
 and will almost surely result in spurious triples being generated in
 RDFa processors in the future.

I've renamed property= to itemprop=.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerStaticHTML

2009-05-11 Thread Kornel Lesiński

On 06.05.2009, at 17:31, Adam Barth wrote:


WHY NOT toStaticHTML?

toStaticHTML addresses the same use cause by translating an untrusted
string to another string that lacks active HTML content.  This API has
two issues:

1) The untrusted string - static string - HTML parser workflow
requires the browser to parse the string twice, introducing a
performance penalty and a security issue if the two parsing aren't
identical.


That is based on assumptions that:
1. parsing is expensive enough to warrant API optimized for this  
particular case

2. browsers cannot optimize it otherwise
3. returned code will be ambiguous

In client-side scripts untrusted content comes from the network, which  
means that parsing time is going to be miniscule compared to time  
required to fetch the content (and to render it). My guess is that  
parsing itself is not a bottleneck.


Second, it _is_ possible to avoid reparsing without special API for  
this. toStaticHTML() may return subclass of String that contains  
reference to parsed DOM. Roughly something like this:


function toStaticHTML(html)
{
var cleanDOM = clean(parse(html))
return {
toString:function(){return unparse(cleanDOM)},
node:cleanDOM
}
}

which should make common case:

innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html;

toStaticHTML() enables other optimisations, e.g. filtered HTML can be  
saved for future use (in local storage) or string filtered once used  
in multiple places.


Alternatively there could be toStaticDOM() method that returns  
DOMDocumentFragment, avoiding reparsing issue entirely.



2) The API is difficult to future-proof because future versions of
HTML are likely to add new tags with active content (e.g., like the
video tag's event handlers).


When support for new tag is added to a browser, it would also be added  
to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't  
be a problem either way. Browser doesn't need to worry about dangerous  
constructs it does not support.


Methods are easier to patch than properties in JavaScript, so if  
implementation of existing toStaticHTML() turned out to be insecure,  
the method could be easily replaced/patched on cilent-side, or  
applications could post-process output of toStaticHTML().

It's not that easy with a property.

I dislike APIs based on magic properties. Properties cannot take  
arguments and we'd have to create new property for every combination  
of arguments. If innerHTML was a method, instead of creating new  
property we could extend it to be innerHTML(html, static=true).


If more sophisticated filtering becomes needed in the future, we could  
have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}),  
but it would be silly to create another  
innerStaticHTMLwithSVGandRDFbutWithoutMarquee property.


--
regards, Kornel





[whatwg] Expandos and Prototyping

2009-05-11 Thread Charles Pritchard

Are expando / prototype functions at all included in the HTML 5 specs?
While we may all know what Object.prototype does, I'd like to see its 
use added to

Section 6: Web browsers.

The Prototype Expando is not necessarily a Javascript-only construct, 
and neither is HTML 5.


While I'm not championing full prototype inheritance, I do wonder 
(out-loud),
whether some small section of HTML 5 might be describe the most basic of 
prototyping and expandos:


Many projects use ellipse or other shapes for example, but this is easier:
CanvasRenderingContext2D.prototype.funcName = function() {
   alert(Fill+this.fillStyle);
}
 document.createElement('canvas').getContext('2d').funcName();

I've never seen any developers attempt to use multiple inheritance 
within the CanvasRenderingContext2D object,
nor have I tested myself to see if Firefox (the champion of such 
schemes) supports it. Which is why I'd be
more than satisfied simply requiring single inheritance. It's already 
available in all implementations,

and we spent a good deal of time making it available in our own.

Expando Prototype would need descriptions of:
expandos, prototyped objects, for(... in ...)

All modern browsers support prototype, and so do many languages (without 
writing libraries).
We've confirmed that it expandos and prototypes work just fine in Active 
X, MS long ago created IDispatchEx.


Any host language with getter / setter availability can implement 
prototyping and expandos on an object, at least of one depth.


I'd like to see  .prototype described in the scripting section.

That said, I'm more hesitant to champion .constructor and .__proto__.

-Charles





Re: [whatwg] Expandos and Prototyping

2009-05-11 Thread Ian Hickson
On Mon, 11 May 2009, Charles Pritchard wrote:

 Are expando / prototype functions at all included in the HTML 5 specs?

Yes. Specifically, HTML5 uses WebIDL for all its definitions of 
interfaces, objects, etc, and the WebIDL spec defines how prototypes, 
custom properties, etc, work:

   http://dev.w3.org/2006/webapi/WebIDL/

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Tim Tepaße
A cursory glance on the new section 5 raises two questions on  
indirection:


(Note the metas in the last example -- since sometimes the  
information

isn't visible, rather than requiring that people put it in and hide it
with display:none, which has a rather poor accessibility story, I  
figured
we could just allow meta anywhere, if it has a property=  
attribute.)


That seems to be a solution optimised for extremely invisible metadata  
but not for metadata which differs from the human visible data.  
Imagine as an example the simple act of marking up a number (and  
ignoring what the number denotes).  For human consumption a thousands  
seperator is often used, the type of seperator differs by language,  
locale and context. Just in my little word I see on regular basis the  
point, the comma, the space, the thin space and sometimes the the  
apostrophe. Parsing different representations of numbers would be a  
chore. The value of textContent of the element span  
itemprop=com.example.price€nbsp;1thinsp;000thinsp;000,mdash;/ 
span is clearly unusable, demanding an additional invisible meta  
property=com.example.price content=100.


My irritation lies in the element proliferation, requiring one element/ 
attribute combination for machines, one element/text content  
combination for humans. Of course, any sane author would arrange both  
elements in a close relation, as parent/child or sibling but there  
would be still two different elements to maintain, leading to a higher  
cognitive load. Not just for authors but also for programmers: a  
fluctating price had to be actualized on two different elements; tree  
walking DOM scripts had to take meta-Elements in account. Furthermore  
it clashes with the familiar habit of other elements in HTML. A  
hyperlink is one element with a machine-readable attribute and human- 
readable text content. A citation is one element with a machine- 
readable reference and human-readable text content. The same model is  
used in meter, progress, time, abbr ... but not in user- 
defined objects. I'd prefer an additional @content-like attribute  
which supersedes the text content and maybe even the default values of  
the other value-bearing elements, reducing two different elements to  
maintain or change to just one.



Instead, let us try using the regular IDREF functionality that  
HTML uses
in a variety of other places, like label for=. For this we'll  
need a
new attribute, but unfortunately we can't use about= (which would  
be the
obvious name to use), because that would conflict with RDFa, so  
instead

we'll use subject=:



I'm slighty irritated by the implied change from active, possessive  
formulating (“The cat has the name Hedral.”) to something more passive- 
y (“Hedral is a name owned by that cat.“). My mental model for  
property relationships orients itself more on the former wording; link  
relationships are similar in that regard. @about/@subject are like  
@rev; a @resource alias @rel would feel more natural. There are  
practical relation by the missing @resource, I think. Imagine a  
document documenting an household and a household vocabulary which  
allows triples of humans which are in an owner relationship to a  
cat. Given an household of two humans and one cat; how does one  
markup the assumption that the cat has two owners?