Re: [whatwg] createEvent() in Web Workers?

2009-11-30 Thread Simon Pieters

On Fri, 27 Nov 2009 17:02:00 +0100, Simon Pieters sim...@opera.com wrote:

An idea for creating events is to support [Constructor] on all event  
IDLs, which makes the createEvent method unnecessary.


Maybe we could even make the arguments to the constructor be called to  
initFooEvent() directly, so instead of doing


var e = document.createEvent('MouseEvents');
e.initMouseEvent('click', ...);
foo.dispatchEvent(e);

you could do

foo.dispatchEvent(new MouseEvent('click', ...))


I've cc-ed www-dom since this is a suggestion for a change to DOM Events.


Another thing we could change is to make all but the first arguments to  
initFooEvent() optional and let them have sensible defaults, so that if  
all you care about is the event type, you can include just the first  
argument.


If the constructor is called with no arguments, then initFooEvent() would  
not be called.


--
Simon Pieters
Opera Software


Re: [whatwg] Canvas pixel manipulation and performance

2009-11-30 Thread Philip Taylor
On Mon, Nov 30, 2009 at 4:46 PM, Kenneth Russell k...@google.com wrote:
 CanvasPixelArray specifies that values greater than 255, including
 +inf, are clamped to 255 and values less than 0, including -inf, are
 clamped to zero. WebGLUnsignedByteArray (as people will see in the
 WebGL draft spec this week or next) specifies that the conversion is
 done with a C-style cast. The results are different for out-of-range
 values.

I was going to say: It doesn't include +/-inf, because
http://whatwg.org/html5#dependencies says if a method with an
argument that is a floating point number type (float) is passed an
Infinity or Not-a-Number (NaN) value, a NOT_SUPPORTED_ERR exception
must be raised, and that probably applies to the CanvasPixelArray
setter method.

But it looks like the spec changed since I last looked, and the setter
takes an 'octet' argument, so I think the conversion should happen as
per http://dev.w3.org/2006/webapi/WebIDL/#es-octet and
CanvasPixelArray shouldn't define any conversion. (Filed as
http://www.w3.org/Bugs/Public/show_bug.cgi?id=8405). Hopefully WebIDL
and WebGL either match or can be made to match.

-- 
Philip Taylor
exc...@gmail.com


[whatwg] figureimg* caption

2009-11-30 Thread Philip Jägenstedt

As currently speced, the proper usage of figure is:

figure
 ddimg src=bunny.jpg alt=A Bunny/dd
 dtThe Cutest Animal/dt
/figure

Apart from all that has been said about legacy parsing, leaking style in  
IE, etc I would (perhaps not be the first to) add:


1. It seems quite easy to confuse or mistype dd/dt. Without guessing how  
often authors will get it wrong, I think everyone agrees that (all else  
equal) a syntax which is harder to confuse/mistype is better.


2. Only the caption needs to be marked up, the content is implicitly  
everything else. While some content may need a wrapping element for  
styling, e.g. img usually does not.


3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.)

The main difficulty with coming up with something better seems to have  
been finding a name for an element which isn't already taken. If that's  
the only issue, why not just take some inspiration from time pubdate and  
use an attribute instead?


figure
 img src=bunny.jpg alt=A Bunny
 p captionThe Cutest Animal/p
/figure

At least to me, it looks clean enough and there are no serious parsing  
issues (just use document.createElement(figure) for IE).


The caption is easy to style with figure *[caption] or any number of  
easy workarounds for browsers that don't support CSS attribute selectors  
(IE6?).


I haven't been following the discussions on figure closely, so if this  
has already been discussed and rejected please link me in the right  
direction.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Canvas pixel manipulation and performance

2009-11-30 Thread Anne van Kesteren
On Mon, 30 Nov 2009 19:31:53 +0100, Philip Taylor  
excors+wha...@gmail.com wrote:

But it looks like the spec changed since I last looked, and the setter
takes an 'octet' argument, so I think the conversion should happen as
per http://dev.w3.org/2006/webapi/WebIDL/#es-octet and
CanvasPixelArray shouldn't define any conversion. (Filed as
http://www.w3.org/Bugs/Public/show_bug.cgi?id=8405). Hopefully WebIDL
and WebGL either match or can be made to match.


It would be nice if they used the same object/interface too... Maybe  
implementations of CanvasPixelArray should hide the interface and other  
details so that we can eventually convert into some kind of octet array if  
we get native support for that.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] figureimg* caption

2009-11-30 Thread Tab Atkins Jr.
On Mon, Nov 30, 2009 at 12:41 PM, Philip Jägenstedt phil...@opera.com wrote:
 As currently speced, the proper usage of figure is:

 figure
  ddimg src=bunny.jpg alt=A Bunny/dd
  dtThe Cutest Animal/dt
 /figure

 Apart from all that has been said about legacy parsing, leaking style in IE,
 etc I would (perhaps not be the first to) add:

 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how
 often authors will get it wrong, I think everyone agrees that (all else
 equal) a syntax which is harder to confuse/mistype is better.

 2. Only the caption needs to be marked up, the content is implicitly
 everything else. While some content may need a wrapping element for styling,
 e.g. img usually does not.

 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.)

 The main difficulty with coming up with something better seems to have been
 finding a name for an element which isn't already taken. If that's the only
 issue, why not just take some inspiration from time pubdate and use an
 attribute instead?

 figure
  img src=bunny.jpg alt=A Bunny
  p captionThe Cutest Animal/p
 /figure

 At least to me, it looks clean enough and there are no serious parsing
 issues (just use document.createElement(figure) for IE).

 The caption is easy to style with figure *[caption] or any number of easy
 workarounds for browsers that don't support CSS attribute selectors (IE6?).

 I haven't been following the discussions on figure closely, so if this has
 already been discussed and rejected please link me in the right direction.

I've proposed and supported this approach for a long time.  It's never
been rejected, but rather more-or-less ignored.  I agree that it
solves the issues nicely, and has an appropriate level of support in
IE7+.  (IE6 is still doing its gradual decline, and I've been allowed
to ignore it since IE8 came out.)

The only thing you have to answer is what to do if there are multiple
@caption elements in the figure.  I suggest taking either the first
or last; the exact choice is pretty much arbitrary.

Note: I would style it with figure  [caption] instead, to ensure
you don't accidentally grab misplaced captions.

~TJ


Re: [whatwg] figureimg* caption

2009-11-30 Thread Nils Dagsson Moskopp
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009
12:50:42 -0600:

 Note: I would style it with figure  [caption] instead, to ensure
 you don't accidentally grab misplaced captions.

I would like to style captions on top differently from captions
underneath. What now ?

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


signature.asc
Description: PGP signature


Re: [whatwg] figureimg* caption

2009-11-30 Thread Jonas Sicking
On Mon, Nov 30, 2009 at 10:41 AM, Philip Jägenstedt phil...@opera.com wrote:
 As currently speced, the proper usage of figure is:

 figure
  ddimg src=bunny.jpg alt=A Bunny/dd
  dtThe Cutest Animal/dt
 /figure

 Apart from all that has been said about legacy parsing, leaking style in IE,
 etc I would (perhaps not be the first to) add:

 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how
 often authors will get it wrong, I think everyone agrees that (all else
 equal) a syntax which is harder to confuse/mistype is better.

 2. Only the caption needs to be marked up, the content is implicitly
 everything else. While some content may need a wrapping element for styling,
 e.g. img usually does not.

 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.)

 The main difficulty with coming up with something better seems to have been
 finding a name for an element which isn't already taken. If that's the only
 issue, why not just take some inspiration from time pubdate and use an
 attribute instead?

 figure
  img src=bunny.jpg alt=A Bunny
  p captionThe Cutest Animal/p
 /figure

 At least to me, it looks clean enough and there are no serious parsing
 issues (just use document.createElement(figure) for IE).

 The caption is easy to style with figure *[caption] or any number of easy
 workarounds for browsers that don't support CSS attribute selectors (IE6?).

 I haven't been following the discussions on figure closely, so if this has
 already been discussed and rejected please link me in the right direction.

I strongly agree with this. The strongest argument for me is that this
much more closely matches how someone would use figure if you don't
read the specification. The fact that you need to use dd/dt seems
very unintuitive and I would expect people to forget to use them a
lot. Especially since there would likely be no stylistic penalty for
forgetting the dd/dt.

dddt are used relatively rarely on the web today, even in
situations where HTML4 says to use them. I think this speaks to their
author unfriendlyness.

/ Jonas


Re: [whatwg] figureimg* caption

2009-11-30 Thread Nikita Popov
Yeah, I think this dd, dt thing isn't really intuitive. (Looks like 
these two elements from definition lists are now used everywhere.)


Your proposed syntax looks more nice. But still, why do we need the 
figure-wrapper? It would be cleaner syntax, in my eyes, if you could 
easily specify an element that is related as a caption to another 
element. Could look like this:

img src=bunny.jpg alt=A Bunny id=bunny
p caption=bunnyThe Cutest Animal/p
or
img src=bunny.jpg alt=A Bunny id=bunny
p for=bunnyThe Cutest Animal/p

Or used in the code-context:
code id=mygreatscriptecho 0;/code
strong for=mygreatscriptDoes nothing, but it's still cool!/strong

I know, I know, for is used for labelled form elements, but I think, 
that is expresses very well the relation between content and caption. 
Furthermore, any related content could be marked up this way. For 
example, there is this strange hgoup-tag, that's used fore grouping 
title and subtitle:

hgroup
   h1Somethind great happened/h1
   h2Now some subtitle in a newspaper article.../h2
/hgroup
If I wanted to place an image between title and subtitle of the article, 
it would look something like this:

hgroup
   h1Somethind great happened/h1
   figure
   ddimg src=Aphotoofit //dd
   dtDescr. of img./dt
   /figure
   h2Now some subtitle in a newspaper article.../h2
/hgroup
The img doesn't really belong in the hgroup. Using the for-attr it would 
look like this:

h1 id=something-great-happenedSomething great happened/h1
img src=Aphotoofit id=theimg /
p for=theimgDescr. of img./p
h2 for=something-great-happenedNow some subtitle in a newspaper 
article.../h2
Here styling is the problem: The fors are all identical and can't be 
distinguished. So maybe get the caption-attr. back in?

h1 id=something-great-happenedSomething great happened/h1
img src=Aphotoofit id=theimg /
p caption for=theimgDescr. of img./p
h2 subtitle for=something-great-happenedNow some subtitle in a 
newspaper article.../h2

Which would be not so nice looking in XML ('caption=caption').
So maybe combine them (which would, too, solve the problem of usage of 
for for forms. [Nice three fors...]]):

h1 id=something-great-happenedSomething great happened/h1
img src=Aphotoofit id=theimg /
p caption-for=theimgDescr. of img./p
h2 subtitle-for=something-great-happenedNow some subtitle in a 
newspaper article.../h2


Philip Jägenstedt schrieb:

As currently speced, the proper usage of figure is:

figure
 ddimg src=bunny.jpg alt=A Bunny/dd
 dtThe Cutest Animal/dt
/figure

Apart from all that has been said about legacy parsing, leaking style 
in IE, etc I would (perhaps not be the first to) add:


1. It seems quite easy to confuse or mistype dd/dt. Without guessing 
how often authors will get it wrong, I think everyone agrees that (all 
else equal) a syntax which is harder to confuse/mistype is better.


2. Only the caption needs to be marked up, the content is implicitly 
everything else. While some content may need a wrapping element for 
styling, e.g. img usually does not.


3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone 
else's.)


The main difficulty with coming up with something better seems to have 
been finding a name for an element which isn't already taken. If 
that's the only issue, why not just take some inspiration from time 
pubdate and use an attribute instead?


figure
 img src=bunny.jpg alt=A Bunny
 p captionThe Cutest Animal/p
/figure

At least to me, it looks clean enough and there are no serious parsing 
issues (just use document.createElement(figure) for IE).


The caption is easy to style with figure *[caption] or any number of 
easy workarounds for browsers that don't support CSS attribute 
selectors (IE6?).


I haven't been following the discussions on figure closely, so if 
this has already been discussed and rejected please link me in the 
right direction.






Re: [whatwg] figureimg* caption

2009-11-30 Thread Philip Jägenstedt
On Mon, 30 Nov 2009 19:50:42 +0100, Tab Atkins Jr. jackalm...@gmail.com  
wrote:



The only thing you have to answer is what to do if there are multiple
@caption elements in the figure.  I suggest taking either the first
or last; the exact choice is pretty much arbitrary.


Make it invalid and have any algorithms that extract captions (if there  
are/will be any) use the first element with @caption.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] figureimg* caption

2009-11-30 Thread Nils Dagsson Moskopp
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009
13:00:00 -0600:

 On Mon, Nov 30, 2009 at 12:57 PM, Nils Dagsson Moskopp
 nils-dagsson-mosk...@dieweltistgarnichtso.net wrote:
  Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009
  12:50:42 -0600:
 
  Note: I would style it with figure  [caption] instead, to ensure
  you don't accidentally grab misplaced captions.
 
  I would like to style captions on top differently from captions
  underneath. What now ?
 
 figure  [caption]:first-child
 or
 figure  [caption]:last-child

Apparently, you did not comprehend my question and incorrectly assumed
that I would always use multiple captions.

So, to make that clear: Without a clear content wrapper, I cannot style
a preceding caption differently from a following caption.


Cheers,
-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


signature.asc
Description: PGP signature


Re: [whatwg] figureimg* caption

2009-11-30 Thread Tab Atkins Jr.
On Mon, Nov 30, 2009 at 1:06 PM, Nikita Popov pri...@ni-po.com wrote:
 Your proposed syntax looks more nice. But still, why do we need the
 figure-wrapper? It would be cleaner syntax, in my eyes, if you could easily
 specify an element that is related as a caption to another element. Could
 look like this:
 img src=bunny.jpg alt=A Bunny id=bunny
 p caption=bunnyThe Cutest Animal/p
 or
 img src=bunny.jpg alt=A Bunny id=bunny
 p for=bunnyThe Cutest Animal/p

People will very commonly use a wrapper in any case, for styling the
figure+caption together.  For example, putting a border and background
on it and positioning it.

As well, using a wrapping element to implicitly scope things is easier
than explicitly using indirection like @for.  I always prefer to do
labeltext input/label instead of label
for=footext/labelinput id=foo, for example, because it's just
plain easier to maintain.

~TJ


Re: [whatwg] figureimg* caption

2009-11-30 Thread Nils Dagsson Moskopp
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009
13:34:27 -0600:

 Apologies, but I have no idea what you're talking about and can only
 assume that we're both misunderstanding each other. […]

You were right. Mea culpa, I apparently left my sense of logic at the
door.

-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


signature.asc
Description: PGP signature


Re: [whatwg] videooverlay for captions/subtitles/etc

2009-11-30 Thread Philip Jägenstedt
On Sun, 29 Nov 2009 12:42:13 +0100, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:



Philip, all,


On Sun, Nov 29, 2009 at 9:37 PM, Philip Jägenstedt phil...@opera.com  
wrote:

On Sun, 29 Nov 2009 06:21:45 +0100, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:

My itext wasn't supposed to stay a JavaScript implementation. In
fact, it had the exact same purpose as your ovelay proposal: to
eventually be added into the HTML5 specification and be properly
integrated, such that it didn't have to rely on the timeupdate.
In fact, the itextlist/itext proposal, which was my second
improvement, see
https://wiki.mozilla.org/Accessibility/HTML5_captions_v2, doesn't look
very different to what you have there.



Yes, that is very clear, I used it only as an example of what needs to  
be

done to parse SRT with JavaScript. Go ahead and edit the wiki if there's
anything that makes it sounds like itext is something it is not.


I guess what I was just missing is mention of what your proposal
provides on top of what I had. You're stating that further down in
your email, so it might be good to mention that. It also shows we are
making progress. :-)


Added a diff statement to the wiki.


I think you've taken the next step with proposing to add a wrapping
div into the DOM - something I wasn't quite sure would be possible
and I'm glad you've taken the step.

Another comment on naming: whether we name the elements itextlist
and itext or alternatively overlay and source, I'm not too
fussed. In fact, I've discussed the renaming/reuse of source for
itext in my recent blog post at

http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/
. I think it may well make a lot of sense since we can reduce the key
required attributes to the ones that already exist for the source
element.



Indeed, my proposal is mainly a remix of itext and cue ranges. The  
main
selling point, though, is a consistent markup and DOM for in-band,  
external
and script-created subtitles and a hook to content into the fullscreen  
mode.


These are where we are indeed making progress - excellent!


I must admit, I am still a bit dubious about how you are proposing to
deal with in-band captions. Is a UA expected to take them out of the
file and directly render them into overlay? Then you don't get the
kind of control you get as a Web author over external captions, e.g.
to specify a media query.


The UA certainly has to parse and render the in-band captions some way, I  
was just trying to find a way to apply styling to them.



Also, the user doesn't get exposed to the tracks that are available,
so he/she could choose interactively. I have been told that such
interactive choice of the to-be-displayed caption track is a
requirement, since people may use the subtitles/captions to learn a
new language or read in their actual native language. YouTube
certainly exposes all the available alternative language tracks - also
because some of these tracks are actually created on the fly by
automated translation. These are some of the reasons I was asked to
provide declarative markup of all of the available subtitle tracks of
video, no matter whether they came out of the media file (in-line) or
not.


Could the people who have given you these requirements possibly join the  
WHATWG and/or W3C HTML a11y TF to explain these use cases? AFAICT, no  
declarative markup is needed to be able to select between caption tracks,  
it can be done either via a native context menu or using script assuming  
that we have an API for exposing the available tracks (which is needed for  
multiple audio and video tracks too).



So, maybe we can use source to not just point at further external
subtitle tracks, but also at in-band subtitle tracks and thus really
make in-band identical to out-of-band? We could even use Media
Fragment URI addressing for such an approach, e.g.

source src=captions-english.srt lang=en/source
source src=video.ogv?track=subtitle[de] lang=de/source

or alternatively if no file was given in the @src attribute of a
source element, it would be clear that it pointed a track in the
original media file like so:

source lang=de/source


Using the query string syntax not possible as query string are completely  
opaque to the client, but the fragment variant seems OK if a bit verbose  
(part of the URL is repeated). However, what happens if an author does  
this:


video src=video.ogv
  source src=captions-english.srt lang=en/source
  source src=other-video.ogv#track=subtitle[de] lang=de/source
/video

Authors have no apparent reason to think this would not work, but an  
implementation that supports it is very, very unlikely to happen. UAs  
which don't understand the MF syntax would presumably download  
other-video.ogv and try decoding it as whatever subtitle formats it  
supports (e.g. SRT).


Perhaps some CSS selector to style in-band captions/subtitles after all?


About the cue ranges:

If I understand your 

[whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Adam Barth
On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote:
 Defining a spec-blessed whitelist of element, attributes, and attribute
 values is and filtering at the parser level is a significant new feature.
 While I see that it has value, I think on the short term it would be
 better to wait for a future version of HTML before introducing this
 feature; ideally once we have more implementation experience with
 experimental versions of this idea.

 I would encourage browser vendors to introduce APIs similar to that
 discussed below, clearly marked as vendor-specific (e.g. for Firefox,
 something like .mozStaticInnerHTML).

The WebKit community is considering taking up such an experimental
implementation.  Here's my current proposal for how this might work:

http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

I would appreciate any feedback on the design.

Thanks,
Adam


Re: [whatwg] figureimg* caption

2009-11-30 Thread Kit Grose
On 01/12/2009, at 6:28 AM, Tab Atkins Jr. wrote:

 People will very commonly use a wrapper in any case, for styling the
 figure+caption together.  For example, putting a border and background
 on it and positioning it.


I agree with the inclusion of a wrapper in that in the standard use-case the 
entire figure is likely to be floated in a document; I can't think of any 
situation where captions would be in a different container than the element it 
refers to.

Is there a semantic reason for p caption rather than simply repurposing the 
caption element itself? It seems to me that captions in this context are 
conceptually identical to captions for tables?

I would imagine all of these to be legal (with both figure and caption being 
explicitly block-level elements):

figure
img /
captionFoo/caption
/figure

figure
captionFoo/caption
img /
/figure

figure
div
img /
/div
captionFoo/caption
/figure

figure
div
img /
/div
div
captionFoo/caption
/div
/figure


Cheers,


Kit Grose
User Experience + Tech Director,
iQmultimedia

(02) 4260 7946
k...@iqmultimedia.com.au
iqmultimedia.com.au

Re: [whatwg] figureimg* caption

2009-11-30 Thread Tab Atkins Jr.
On Mon, Nov 30, 2009 at 6:07 PM, Kit Grose k...@iqmultimedia.com.au wrote:
 Is there a semantic reason for p caption rather than simply repurposing the 
 caption element itself? It seems to me that captions in this context are 
 conceptually identical to captions for tables?

Not a semantic reason, just a practical one.  All existing browsers do
something completely wrong when they encounter caption outside of a
table.  It's at least as bad as their handling of legend outside
fieldset.

Otherwise, yes, caption would definitely be the ideal.

~TJ


Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Maciej Stachowiak


On Nov 30, 2009, at 3:55 PM, Adam Barth wrote:


On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote:
Defining a spec-blessed whitelist of element, attributes, and  
attribute
values is and filtering at the parser level is a significant new  
feature.

While I see that it has value, I think on the short term it would be
better to wait for a future version of HTML before introducing this
feature; ideally once we have more implementation experience with
experimental versions of this idea.

I would encourage browser vendors to introduce APIs similar to that
discussed below, clearly marked as vendor-specific (e.g. for Firefox,
something like .mozStaticInnerHTML).


The WebKit community is considering taking up such an experimental
implementation.  Here's my current proposal for how this might work:

http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

I would appreciate any feedback on the design.


I neglected to give feedback on webkit-dev but here's my comments:

1) It seems like this API is harder to use than a sandboxed iframe. To  
use it correctly, you need to determine a whitelist of safe elements  
and attributes; providing an explicit whitelist at least of tags is  
mandatory. With a sandboxed iframe, as a Web developer you can just  
ask the browser to turn off unsafe things and not worry about  
designing a security policy. Besides ease of use, there is also the  
concern that a server-side filtering whitelist may be buggy, and if  
you apply the same whitelist on the client side as backup instead of  
doing something high level like disable scripting then you are less  
likely to benefit from defense in depth, since you may just replicate  
the bug.


2) It seems like this API loses one of the big benefits of sanitizing  
HTML in the browser implementation. Specifically, in theory it's safe  
to say allow everything except any construct that would result in  
script/code running. You can't do that on the server side -  
blacklisting is not sound because you can't predict the capabilities  
of all browsers. But the browser can predict its own capabilities.  
Sandboxed iframes do allow for this.


I think the benefits of filtering by tag/attribute/scheme for advanced  
experts are outweighed by these two disadvantages for basic use,  
compared to something simple like the original staticInnerHTML idea.  
Another possible alternative is to express how to sanitize at a higher  
level, using something similar to sandboxed iframe feature strings.


Here's a problem that exists with both this API and also  
innerStaticHTML:


3) There is no secure and efficient way to append sanitized contents  
to an element that already has children. This may result in authors  
appending with innerHTML +=  (inefficient and insecure!) or  
insertAdjecentHTML() (efficient but still insecure!). I'm willing to  
concede that use cases other than replace existing contents and  
append to existing contents are fairly exotic.


Regards,
Maciej



Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Adam Barth
On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com wrote:
 1) It seems like this API is harder to use than a sandboxed iframe. To use
 it correctly, you need to determine a whitelist of safe elements and
 attributes; providing an explicit whitelist at least of tags is mandatory.
 With a sandboxed iframe, as a Web developer you can just ask the browser to
 turn off unsafe things and not worry about designing a security policy.
 Besides ease of use, there is also the concern that a server-side filtering
 whitelist may be buggy, and if you apply the same whitelist on the client
 side as backup instead of doing something high level like disable
 scripting then you are less likely to benefit from defense in depth, since
 you may just replicate the bug.

I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.

I think updateWithSanitizedHTML has different use cases than @sandbox.
 I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).

 2) It seems like this API loses one of the big benefits of sanitizing HTML
 in the browser implementation. Specifically, in theory it's safe to say
 allow everything except any construct that would result in script/code
 running. You can't do that on the server side - blacklisting is not sound
 because you can't predict the capabilities of all browsers. But the browser
 can predict its own capabilities. Sandboxed iframes do allow for this.

The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by img/src=javascript: and friends.  Also,
this API is useful in cases where you don't have a server to help you
sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.

 I think the benefits of filtering by tag/attribute/scheme for advanced
 experts are outweighed by these two disadvantages for basic use, compared to
 something simple like the original staticInnerHTML idea. Another possible
 alternative is to express how to sanitize at a higher level, using something
 similar to sandboxed iframe feature strings.

If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is (a b strong i em,
href), but another message board might want something different.
For example, 4chan might want (img, src alt).  I don't think these
require particularly advanced experts to understand.

 Here's a problem that exists with both this API and also innerStaticHTML:

 3) There is no secure and efficient way to append sanitized contents to an
 element that already has children. This may result in authors appending with
 innerHTML +=  (inefficient and insecure!) or insertAdjecentHTML() (efficient
 but still insecure!). I'm willing to concede that use cases other than
 replace existing contents and append to existing contents are fairly
 exotic.

Maybe we need insertAdjecentSanitizedHTML instead or in addition.  ;)

Adam


Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Maciej Stachowiak


On Nov 30, 2009, at 6:32 PM, Adam Barth wrote:

On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com  
wrote:
1) It seems like this API is harder to use than a sandboxed iframe.  
To use

it correctly, you need to determine a whitelist of safe elements and
attributes; providing an explicit whitelist at least of tags is  
mandatory.
With a sandboxed iframe, as a Web developer you can just ask the  
browser to
turn off unsafe things and not worry about designing a security  
policy.
Besides ease of use, there is also the concern that a server-side  
filtering
whitelist may be buggy, and if you apply the same whitelist on the  
client

side as backup instead of doing something high level like disable
scripting then you are less likely to benefit from defense in  
depth, since

you may just replicate the bug.


I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.


For server-side sanitization, this kind of explicit API is pretty much  
the only thing you can do.




I think updateWithSanitizedHTML has different use cases than @sandbox.
I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).


I can imagine use cases where allowing very open-ended but script-free  
content is desirable. For example, consider a hosted blog service that  
wants to let blog authors write nearly arbitrary HTML, but without  
allowing script. @sandbox would not be a good solution for that use  
case. In general it does not seem sensible to me that the choice of  
tag whitelisting vs high-level feature whitelisting is tied to the  
choice of embedding content directly vs. creating a frame. Is there a  
technical reason these two choices have to be tied?




2) It seems like this API loses one of the big benefits of  
sanitizing HTML
in the browser implementation. Specifically, in theory it's safe to  
say
allow everything except any construct that would result in script/ 
code
running. You can't do that on the server side - blacklisting is  
not sound
because you can't predict the capabilities of all browsers. But the  
browser
can predict its own capabilities. Sandboxed iframes do allow for  
this.


The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by img/src=javascript: and friends.


It's true, this is a benefit. However, it seems like even if you  
whitelist tags, being able to say no script at a high level


Also, this API is useful in cases where you don't have a server to  
help you

sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.


If the EXIF metadata is supposed to be text-only, it seems like  
updateWithSanitizedHTML would not be easier to use than innerText, or  
in any way superior. For cases where it is actually desirable to allow  
some markup, it's not clear to me that giving explicit whitelists of  
what is allowed is the simple choice.




I think the benefits of filtering by tag/attribute/scheme for  
advanced
experts are outweighed by these two disadvantages for basic use,  
compared to
something simple like the original staticInnerHTML idea. Another  
possible
alternative is to express how to sanitize at a higher level, using  
something

similar to sandboxed iframe feature strings.


If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is (a b strong i em,
href), but another message board might want something different.
For example, 4chan might want (img, src alt).  I don't think these
require particularly advanced experts to understand.


updateWithSanitizedHTML and @sandbox both provide features that the  
other does not for reasons that do not seem technically necessary. For  
example, updateWithSanitizedHTML could easily have an allow  
everything except script mode, and @sandbox could easily allow per- 
tag whitelisting. Then the choice would be between the resource cost  
of a frame, and the sandboxing features that it's 

Re: [whatwg] Web Workers: SyntaxError exception?

2009-11-30 Thread Ian Hickson
On Tue, 3 Nov 2009, Simon Pieters wrote:

 Web Workers says
 
 If it failed to parse, then throw a SyntaxError exception and abort all 
 these steps.
 
 Shouldn't that be SYNTAX_ERR exception?

No, it's trying to emulate eval().

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] [WebWorkers] About the delegation example

2009-11-30 Thread Ian Hickson
On Thu, 5 Nov 2009, David Bruant wrote:
 
 First of all, there is a typo error in this example. The main HTML page 
 is a copy/paste of the first example (Worker example: One-core 
 computation).

Fixed.


 My point here is to ask for a new attribute for the navigator object 
 that could describe the best number of workers in a delegation use 
 case.

It's not clear to me what the best number of workers is. It's not the 
number of CPUs, cores, or hardware threads, since it depends at least as 
much on the system load as on the system capabilities. And it varies over 
time, since the load is a function of time.


 In the delegation example, the number of workers chosen is an arbitrary 
 10. But, in a single-core processor, having only one worker will result 
 in more or less the same running time, because at the end, each worker 
 runs on the only core.

That depends on the algorithm. If the algorithm uses a lot of data, then 
a single hardware thread might be able to run two workers in the same time 
as it runs one, with one worker waiting for data while the other runs 
code, and with the workers trading back and forth.

Personally I would recommend basing the number of workers on the number of 
shards that the input data is split into, and then relying on the UA to 
avoid thrashing. I would expect UAs to notice when a script spawns a 
bazillion workers, and have the UA run them in a staggered fashion, so as 
to not starve the system resources. This is almost certainly needed 
anyway, to prevent pages from DOSing the user's system.


 On the other hand, on a 16-core processor (which doesn't exist yet, but 
 is a realistic idea for the next couple of decades), the task could be 
 executed faster with 16 workers.

Well, again, that's not a given. If the algorithm is mostly network-bound 
or disk-bound, then it might well be that running multiple workers doesn't 
really gain you anything, and you might as well just do everything in one 
worker. It's hard to make generalisations about this kind of thing.


 Moreover, for a totally other purpose, this attribute could be used to 
 make statistics on the spread of multicore processors like the 
 statistics that are already done for operating system or screen 
 resolution use.

Do we really want to expose this? That seems like a minor privacy leak.


On Fri, 6 Nov 2009, Drew Wilson wrote:
 
 Exposing information that's not reliable seems worse than not exposing 
 it at all, and would encourage applications to grab all available 
 resources (after all, that's the purpose of the API!). And the problem 
 domains that would benefit from this information (arbitrarily 
 parallelizable algorithms like ray tracing) seem to be few in number.

Indeed.


On Fri, 6 Nov 2009, Rob Ennals wrote:
 
 Maybe what we really want here is some kind of parallel map operation 
 where we give the user agent an array and then say call this function 
 on each element, using as many threads as you deem appropriate given 
 the resources available. Each function call would logically execute in 
 it's own worker context, but to keep semantics transparent, we might 
 declare that such workers are not be allowed to send messages (other 
 than a final result) and so could not tell how many parallel workers had 
 actually been created.

This is a reasonably good idea. It might make sense to do in v2 of Web 
Workers.


I haven't added anything to Web Workers for now.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'