Re: [whatwg] createEvent() in Web Workers?
On Fri, 27 Nov 2009 17:02:00 +0100, Simon Pieters sim...@opera.com wrote: An idea for creating events is to support [Constructor] on all event IDLs, which makes the createEvent method unnecessary. Maybe we could even make the arguments to the constructor be called to initFooEvent() directly, so instead of doing var e = document.createEvent('MouseEvents'); e.initMouseEvent('click', ...); foo.dispatchEvent(e); you could do foo.dispatchEvent(new MouseEvent('click', ...)) I've cc-ed www-dom since this is a suggestion for a change to DOM Events. Another thing we could change is to make all but the first arguments to initFooEvent() optional and let them have sensible defaults, so that if all you care about is the event type, you can include just the first argument. If the constructor is called with no arguments, then initFooEvent() would not be called. -- Simon Pieters Opera Software
Re: [whatwg] Canvas pixel manipulation and performance
On Mon, Nov 30, 2009 at 4:46 PM, Kenneth Russell k...@google.com wrote: CanvasPixelArray specifies that values greater than 255, including +inf, are clamped to 255 and values less than 0, including -inf, are clamped to zero. WebGLUnsignedByteArray (as people will see in the WebGL draft spec this week or next) specifies that the conversion is done with a C-style cast. The results are different for out-of-range values. I was going to say: It doesn't include +/-inf, because http://whatwg.org/html5#dependencies says if a method with an argument that is a floating point number type (float) is passed an Infinity or Not-a-Number (NaN) value, a NOT_SUPPORTED_ERR exception must be raised, and that probably applies to the CanvasPixelArray setter method. But it looks like the spec changed since I last looked, and the setter takes an 'octet' argument, so I think the conversion should happen as per http://dev.w3.org/2006/webapi/WebIDL/#es-octet and CanvasPixelArray shouldn't define any conversion. (Filed as http://www.w3.org/Bugs/Public/show_bug.cgi?id=8405). Hopefully WebIDL and WebGL either match or can be made to match. -- Philip Taylor exc...@gmail.com
[whatwg] figureimg* caption
As currently speced, the proper usage of figure is: figure ddimg src=bunny.jpg alt=A Bunny/dd dtThe Cutest Animal/dt /figure Apart from all that has been said about legacy parsing, leaking style in IE, etc I would (perhaps not be the first to) add: 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how often authors will get it wrong, I think everyone agrees that (all else equal) a syntax which is harder to confuse/mistype is better. 2. Only the caption needs to be marked up, the content is implicitly everything else. While some content may need a wrapping element for styling, e.g. img usually does not. 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.) The main difficulty with coming up with something better seems to have been finding a name for an element which isn't already taken. If that's the only issue, why not just take some inspiration from time pubdate and use an attribute instead? figure img src=bunny.jpg alt=A Bunny p captionThe Cutest Animal/p /figure At least to me, it looks clean enough and there are no serious parsing issues (just use document.createElement(figure) for IE). The caption is easy to style with figure *[caption] or any number of easy workarounds for browsers that don't support CSS attribute selectors (IE6?). I haven't been following the discussions on figure closely, so if this has already been discussed and rejected please link me in the right direction. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Canvas pixel manipulation and performance
On Mon, 30 Nov 2009 19:31:53 +0100, Philip Taylor excors+wha...@gmail.com wrote: But it looks like the spec changed since I last looked, and the setter takes an 'octet' argument, so I think the conversion should happen as per http://dev.w3.org/2006/webapi/WebIDL/#es-octet and CanvasPixelArray shouldn't define any conversion. (Filed as http://www.w3.org/Bugs/Public/show_bug.cgi?id=8405). Hopefully WebIDL and WebGL either match or can be made to match. It would be nice if they used the same object/interface too... Maybe implementations of CanvasPixelArray should hide the interface and other details so that we can eventually convert into some kind of octet array if we get native support for that. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] figureimg* caption
On Mon, Nov 30, 2009 at 12:41 PM, Philip Jägenstedt phil...@opera.com wrote: As currently speced, the proper usage of figure is: figure ddimg src=bunny.jpg alt=A Bunny/dd dtThe Cutest Animal/dt /figure Apart from all that has been said about legacy parsing, leaking style in IE, etc I would (perhaps not be the first to) add: 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how often authors will get it wrong, I think everyone agrees that (all else equal) a syntax which is harder to confuse/mistype is better. 2. Only the caption needs to be marked up, the content is implicitly everything else. While some content may need a wrapping element for styling, e.g. img usually does not. 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.) The main difficulty with coming up with something better seems to have been finding a name for an element which isn't already taken. If that's the only issue, why not just take some inspiration from time pubdate and use an attribute instead? figure img src=bunny.jpg alt=A Bunny p captionThe Cutest Animal/p /figure At least to me, it looks clean enough and there are no serious parsing issues (just use document.createElement(figure) for IE). The caption is easy to style with figure *[caption] or any number of easy workarounds for browsers that don't support CSS attribute selectors (IE6?). I haven't been following the discussions on figure closely, so if this has already been discussed and rejected please link me in the right direction. I've proposed and supported this approach for a long time. It's never been rejected, but rather more-or-less ignored. I agree that it solves the issues nicely, and has an appropriate level of support in IE7+. (IE6 is still doing its gradual decline, and I've been allowed to ignore it since IE8 came out.) The only thing you have to answer is what to do if there are multiple @caption elements in the figure. I suggest taking either the first or last; the exact choice is pretty much arbitrary. Note: I would style it with figure [caption] instead, to ensure you don't accidentally grab misplaced captions. ~TJ
Re: [whatwg] figureimg* caption
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009 12:50:42 -0600: Note: I would style it with figure [caption] instead, to ensure you don't accidentally grab misplaced captions. I would like to style captions on top differently from captions underneath. What now ? -- Nils Dagsson Moskopp // erlehmann http://dieweltistgarnichtso.net signature.asc Description: PGP signature
Re: [whatwg] figureimg* caption
On Mon, Nov 30, 2009 at 10:41 AM, Philip Jägenstedt phil...@opera.com wrote: As currently speced, the proper usage of figure is: figure ddimg src=bunny.jpg alt=A Bunny/dd dtThe Cutest Animal/dt /figure Apart from all that has been said about legacy parsing, leaking style in IE, etc I would (perhaps not be the first to) add: 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how often authors will get it wrong, I think everyone agrees that (all else equal) a syntax which is harder to confuse/mistype is better. 2. Only the caption needs to be marked up, the content is implicitly everything else. While some content may need a wrapping element for styling, e.g. img usually does not. 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.) The main difficulty with coming up with something better seems to have been finding a name for an element which isn't already taken. If that's the only issue, why not just take some inspiration from time pubdate and use an attribute instead? figure img src=bunny.jpg alt=A Bunny p captionThe Cutest Animal/p /figure At least to me, it looks clean enough and there are no serious parsing issues (just use document.createElement(figure) for IE). The caption is easy to style with figure *[caption] or any number of easy workarounds for browsers that don't support CSS attribute selectors (IE6?). I haven't been following the discussions on figure closely, so if this has already been discussed and rejected please link me in the right direction. I strongly agree with this. The strongest argument for me is that this much more closely matches how someone would use figure if you don't read the specification. The fact that you need to use dd/dt seems very unintuitive and I would expect people to forget to use them a lot. Especially since there would likely be no stylistic penalty for forgetting the dd/dt. dddt are used relatively rarely on the web today, even in situations where HTML4 says to use them. I think this speaks to their author unfriendlyness. / Jonas
Re: [whatwg] figureimg* caption
Yeah, I think this dd, dt thing isn't really intuitive. (Looks like these two elements from definition lists are now used everywhere.) Your proposed syntax looks more nice. But still, why do we need the figure-wrapper? It would be cleaner syntax, in my eyes, if you could easily specify an element that is related as a caption to another element. Could look like this: img src=bunny.jpg alt=A Bunny id=bunny p caption=bunnyThe Cutest Animal/p or img src=bunny.jpg alt=A Bunny id=bunny p for=bunnyThe Cutest Animal/p Or used in the code-context: code id=mygreatscriptecho 0;/code strong for=mygreatscriptDoes nothing, but it's still cool!/strong I know, I know, for is used for labelled form elements, but I think, that is expresses very well the relation between content and caption. Furthermore, any related content could be marked up this way. For example, there is this strange hgoup-tag, that's used fore grouping title and subtitle: hgroup h1Somethind great happened/h1 h2Now some subtitle in a newspaper article.../h2 /hgroup If I wanted to place an image between title and subtitle of the article, it would look something like this: hgroup h1Somethind great happened/h1 figure ddimg src=Aphotoofit //dd dtDescr. of img./dt /figure h2Now some subtitle in a newspaper article.../h2 /hgroup The img doesn't really belong in the hgroup. Using the for-attr it would look like this: h1 id=something-great-happenedSomething great happened/h1 img src=Aphotoofit id=theimg / p for=theimgDescr. of img./p h2 for=something-great-happenedNow some subtitle in a newspaper article.../h2 Here styling is the problem: The fors are all identical and can't be distinguished. So maybe get the caption-attr. back in? h1 id=something-great-happenedSomething great happened/h1 img src=Aphotoofit id=theimg / p caption for=theimgDescr. of img./p h2 subtitle for=something-great-happenedNow some subtitle in a newspaper article.../h2 Which would be not so nice looking in XML ('caption=caption'). So maybe combine them (which would, too, solve the problem of usage of for for forms. [Nice three fors...]]): h1 id=something-great-happenedSomething great happened/h1 img src=Aphotoofit id=theimg / p caption-for=theimgDescr. of img./p h2 subtitle-for=something-great-happenedNow some subtitle in a newspaper article.../h2 Philip Jägenstedt schrieb: As currently speced, the proper usage of figure is: figure ddimg src=bunny.jpg alt=A Bunny/dd dtThe Cutest Animal/dt /figure Apart from all that has been said about legacy parsing, leaking style in IE, etc I would (perhaps not be the first to) add: 1. It seems quite easy to confuse or mistype dd/dt. Without guessing how often authors will get it wrong, I think everyone agrees that (all else equal) a syntax which is harder to confuse/mistype is better. 2. Only the caption needs to be marked up, the content is implicitly everything else. While some content may need a wrapping element for styling, e.g. img usually does not. 3. Aesthetics. (My eyes are bleeding, but I can't speak for anyone else's.) The main difficulty with coming up with something better seems to have been finding a name for an element which isn't already taken. If that's the only issue, why not just take some inspiration from time pubdate and use an attribute instead? figure img src=bunny.jpg alt=A Bunny p captionThe Cutest Animal/p /figure At least to me, it looks clean enough and there are no serious parsing issues (just use document.createElement(figure) for IE). The caption is easy to style with figure *[caption] or any number of easy workarounds for browsers that don't support CSS attribute selectors (IE6?). I haven't been following the discussions on figure closely, so if this has already been discussed and rejected please link me in the right direction.
Re: [whatwg] figureimg* caption
On Mon, 30 Nov 2009 19:50:42 +0100, Tab Atkins Jr. jackalm...@gmail.com wrote: The only thing you have to answer is what to do if there are multiple @caption elements in the figure. I suggest taking either the first or last; the exact choice is pretty much arbitrary. Make it invalid and have any algorithms that extract captions (if there are/will be any) use the first element with @caption. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] figureimg* caption
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009 13:00:00 -0600: On Mon, Nov 30, 2009 at 12:57 PM, Nils Dagsson Moskopp nils-dagsson-mosk...@dieweltistgarnichtso.net wrote: Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009 12:50:42 -0600: Note: I would style it with figure [caption] instead, to ensure you don't accidentally grab misplaced captions. I would like to style captions on top differently from captions underneath. What now ? figure [caption]:first-child or figure [caption]:last-child Apparently, you did not comprehend my question and incorrectly assumed that I would always use multiple captions. So, to make that clear: Without a clear content wrapper, I cannot style a preceding caption differently from a following caption. Cheers, -- Nils Dagsson Moskopp // erlehmann http://dieweltistgarnichtso.net signature.asc Description: PGP signature
Re: [whatwg] figureimg* caption
On Mon, Nov 30, 2009 at 1:06 PM, Nikita Popov pri...@ni-po.com wrote: Your proposed syntax looks more nice. But still, why do we need the figure-wrapper? It would be cleaner syntax, in my eyes, if you could easily specify an element that is related as a caption to another element. Could look like this: img src=bunny.jpg alt=A Bunny id=bunny p caption=bunnyThe Cutest Animal/p or img src=bunny.jpg alt=A Bunny id=bunny p for=bunnyThe Cutest Animal/p People will very commonly use a wrapper in any case, for styling the figure+caption together. For example, putting a border and background on it and positioning it. As well, using a wrapping element to implicitly scope things is easier than explicitly using indirection like @for. I always prefer to do labeltext input/label instead of label for=footext/labelinput id=foo, for example, because it's just plain easier to maintain. ~TJ
Re: [whatwg] figureimg* caption
Tab Atkins Jr. jackalm...@gmail.com schrieb am Mon, 30 Nov 2009 13:34:27 -0600: Apologies, but I have no idea what you're talking about and can only assume that we're both misunderstanding each other. […] You were right. Mea culpa, I apparently left my sense of logic at the door. -- Nils Dagsson Moskopp // erlehmann http://dieweltistgarnichtso.net signature.asc Description: PGP signature
Re: [whatwg] videooverlay for captions/subtitles/etc
On Sun, 29 Nov 2009 12:42:13 +0100, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: Philip, all, On Sun, Nov 29, 2009 at 9:37 PM, Philip Jägenstedt phil...@opera.com wrote: On Sun, 29 Nov 2009 06:21:45 +0100, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: My itext wasn't supposed to stay a JavaScript implementation. In fact, it had the exact same purpose as your ovelay proposal: to eventually be added into the HTML5 specification and be properly integrated, such that it didn't have to rely on the timeupdate. In fact, the itextlist/itext proposal, which was my second improvement, see https://wiki.mozilla.org/Accessibility/HTML5_captions_v2, doesn't look very different to what you have there. Yes, that is very clear, I used it only as an example of what needs to be done to parse SRT with JavaScript. Go ahead and edit the wiki if there's anything that makes it sounds like itext is something it is not. I guess what I was just missing is mention of what your proposal provides on top of what I had. You're stating that further down in your email, so it might be good to mention that. It also shows we are making progress. :-) Added a diff statement to the wiki. I think you've taken the next step with proposing to add a wrapping div into the DOM - something I wasn't quite sure would be possible and I'm glad you've taken the step. Another comment on naming: whether we name the elements itextlist and itext or alternatively overlay and source, I'm not too fussed. In fact, I've discussed the renaming/reuse of source for itext in my recent blog post at http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/ . I think it may well make a lot of sense since we can reduce the key required attributes to the ones that already exist for the source element. Indeed, my proposal is mainly a remix of itext and cue ranges. The main selling point, though, is a consistent markup and DOM for in-band, external and script-created subtitles and a hook to content into the fullscreen mode. These are where we are indeed making progress - excellent! I must admit, I am still a bit dubious about how you are proposing to deal with in-band captions. Is a UA expected to take them out of the file and directly render them into overlay? Then you don't get the kind of control you get as a Web author over external captions, e.g. to specify a media query. The UA certainly has to parse and render the in-band captions some way, I was just trying to find a way to apply styling to them. Also, the user doesn't get exposed to the tracks that are available, so he/she could choose interactively. I have been told that such interactive choice of the to-be-displayed caption track is a requirement, since people may use the subtitles/captions to learn a new language or read in their actual native language. YouTube certainly exposes all the available alternative language tracks - also because some of these tracks are actually created on the fly by automated translation. These are some of the reasons I was asked to provide declarative markup of all of the available subtitle tracks of video, no matter whether they came out of the media file (in-line) or not. Could the people who have given you these requirements possibly join the WHATWG and/or W3C HTML a11y TF to explain these use cases? AFAICT, no declarative markup is needed to be able to select between caption tracks, it can be done either via a native context menu or using script assuming that we have an API for exposing the available tracks (which is needed for multiple audio and video tracks too). So, maybe we can use source to not just point at further external subtitle tracks, but also at in-band subtitle tracks and thus really make in-band identical to out-of-band? We could even use Media Fragment URI addressing for such an approach, e.g. source src=captions-english.srt lang=en/source source src=video.ogv?track=subtitle[de] lang=de/source or alternatively if no file was given in the @src attribute of a source element, it would be clear that it pointed a track in the original media file like so: source lang=de/source Using the query string syntax not possible as query string are completely opaque to the client, but the fragment variant seems OK if a bit verbose (part of the URL is repeated). However, what happens if an author does this: video src=video.ogv source src=captions-english.srt lang=en/source source src=other-video.ogv#track=subtitle[de] lang=de/source /video Authors have no apparent reason to think this would not work, but an implementation that supports it is very, very unlikely to happen. UAs which don't understand the MF syntax would presumably download other-video.ogv and try decoding it as whatever subtitle formats it supports (e.g. SRT). Perhaps some CSS selector to style in-band captions/subtitles after all? About the cue ranges: If I understand your
[whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote: Defining a spec-blessed whitelist of element, attributes, and attribute values is and filtering at the parser level is a significant new feature. While I see that it has value, I think on the short term it would be better to wait for a future version of HTML before introducing this feature; ideally once we have more implementation experience with experimental versions of this idea. I would encourage browser vendors to introduce APIs similar to that discussed below, clearly marked as vendor-specific (e.g. for Firefox, something like .mozStaticInnerHTML). The WebKit community is considering taking up such an experimental implementation. Here's my current proposal for how this might work: http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en I would appreciate any feedback on the design. Thanks, Adam
Re: [whatwg] figureimg* caption
On 01/12/2009, at 6:28 AM, Tab Atkins Jr. wrote: People will very commonly use a wrapper in any case, for styling the figure+caption together. For example, putting a border and background on it and positioning it. I agree with the inclusion of a wrapper in that in the standard use-case the entire figure is likely to be floated in a document; I can't think of any situation where captions would be in a different container than the element it refers to. Is there a semantic reason for p caption rather than simply repurposing the caption element itself? It seems to me that captions in this context are conceptually identical to captions for tables? I would imagine all of these to be legal (with both figure and caption being explicitly block-level elements): figure img / captionFoo/caption /figure figure captionFoo/caption img / /figure figure div img / /div captionFoo/caption /figure figure div img / /div div captionFoo/caption /div /figure Cheers, Kit Grose User Experience + Tech Director, iQmultimedia (02) 4260 7946 k...@iqmultimedia.com.au iqmultimedia.com.au
Re: [whatwg] figureimg* caption
On Mon, Nov 30, 2009 at 6:07 PM, Kit Grose k...@iqmultimedia.com.au wrote: Is there a semantic reason for p caption rather than simply repurposing the caption element itself? It seems to me that captions in this context are conceptually identical to captions for tables? Not a semantic reason, just a practical one. All existing browsers do something completely wrong when they encounter caption outside of a table. It's at least as bad as their handling of legend outside fieldset. Otherwise, yes, caption would definitely be the ideal. ~TJ
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Nov 30, 2009, at 3:55 PM, Adam Barth wrote: On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote: Defining a spec-blessed whitelist of element, attributes, and attribute values is and filtering at the parser level is a significant new feature. While I see that it has value, I think on the short term it would be better to wait for a future version of HTML before introducing this feature; ideally once we have more implementation experience with experimental versions of this idea. I would encourage browser vendors to introduce APIs similar to that discussed below, clearly marked as vendor-specific (e.g. for Firefox, something like .mozStaticInnerHTML). The WebKit community is considering taking up such an experimental implementation. Here's my current proposal for how this might work: http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en I would appreciate any feedback on the design. I neglected to give feedback on webkit-dev but here's my comments: 1) It seems like this API is harder to use than a sandboxed iframe. To use it correctly, you need to determine a whitelist of safe elements and attributes; providing an explicit whitelist at least of tags is mandatory. With a sandboxed iframe, as a Web developer you can just ask the browser to turn off unsafe things and not worry about designing a security policy. Besides ease of use, there is also the concern that a server-side filtering whitelist may be buggy, and if you apply the same whitelist on the client side as backup instead of doing something high level like disable scripting then you are less likely to benefit from defense in depth, since you may just replicate the bug. 2) It seems like this API loses one of the big benefits of sanitizing HTML in the browser implementation. Specifically, in theory it's safe to say allow everything except any construct that would result in script/code running. You can't do that on the server side - blacklisting is not sound because you can't predict the capabilities of all browsers. But the browser can predict its own capabilities. Sandboxed iframes do allow for this. I think the benefits of filtering by tag/attribute/scheme for advanced experts are outweighed by these two disadvantages for basic use, compared to something simple like the original staticInnerHTML idea. Another possible alternative is to express how to sanitize at a higher level, using something similar to sandboxed iframe feature strings. Here's a problem that exists with both this API and also innerStaticHTML: 3) There is no secure and efficient way to append sanitized contents to an element that already has children. This may result in authors appending with innerHTML += (inefficient and insecure!) or insertAdjecentHTML() (efficient but still insecure!). I'm willing to concede that use cases other than replace existing contents and append to existing contents are fairly exotic. Regards, Maciej
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com wrote: 1) It seems like this API is harder to use than a sandboxed iframe. To use it correctly, you need to determine a whitelist of safe elements and attributes; providing an explicit whitelist at least of tags is mandatory. With a sandboxed iframe, as a Web developer you can just ask the browser to turn off unsafe things and not worry about designing a security policy. Besides ease of use, there is also the concern that a server-side filtering whitelist may be buggy, and if you apply the same whitelist on the client side as backup instead of doing something high level like disable scripting then you are less likely to benefit from defense in depth, since you may just replicate the bug. I should follow up with folks in the ruby-on-rails community to see how they view their sanitize API. The one person I asked had a positive opinion, but we should get a bigger sample size. I think updateWithSanitizedHTML has different use cases than @sandbox. I think the killer applications for @sandbox are advertisements and gadgets. In those cases, the developer wants most of the browser's functionality, but wants to turn off some dangerous stuff (like plug-ins). For updateWithSanitizedHTML, the killer application is something like blog comments, where you basically want text with some formatting tags (bold, italics, and maybe images depending on the forum). 2) It seems like this API loses one of the big benefits of sanitizing HTML in the browser implementation. Specifically, in theory it's safe to say allow everything except any construct that would result in script/code running. You can't do that on the server side - blacklisting is not sound because you can't predict the capabilities of all browsers. But the browser can predict its own capabilities. Sandboxed iframes do allow for this. The benefit is that you know you're getting the right parsing. You're not going to be tripped up by img/src=javascript: and friends. Also, this API is useful in cases where you don't have a server to help you sanitize your input. One example I saw recently was a GreaseMonkey script that wanted to add EXIF metadata to Flickr. Basically, the script grabbed the EXIF data from api.flickr.com and added it to the current page. Unfortunately, that meant I could use this GreaseMonkey script to XSS Flickr by adding HTML to my EXIF metadata. Sure, there are other ways of solving the problem (I asked the developer to build the DOM in memory and use innerText), but you want something simple for these cases. I think the benefits of filtering by tag/attribute/scheme for advanced experts are outweighed by these two disadvantages for basic use, compared to something simple like the original staticInnerHTML idea. Another possible alternative is to express how to sanitize at a higher level, using something similar to sandboxed iframe feature strings. If you think of @sandbox as being optimized for rich untrusted content and updateWithSanitizedHTML as being optimized for poor untrusted content, then you'll see that's what the API does already. The feature string Slashdot wants for its comments is (a b strong i em, href), but another message board might want something different. For example, 4chan might want (img, src alt). I don't think these require particularly advanced experts to understand. Here's a problem that exists with both this API and also innerStaticHTML: 3) There is no secure and efficient way to append sanitized contents to an element that already has children. This may result in authors appending with innerHTML += (inefficient and insecure!) or insertAdjecentHTML() (efficient but still insecure!). I'm willing to concede that use cases other than replace existing contents and append to existing contents are fairly exotic. Maybe we need insertAdjecentSanitizedHTML instead or in addition. ;) Adam
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Nov 30, 2009, at 6:32 PM, Adam Barth wrote: On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com wrote: 1) It seems like this API is harder to use than a sandboxed iframe. To use it correctly, you need to determine a whitelist of safe elements and attributes; providing an explicit whitelist at least of tags is mandatory. With a sandboxed iframe, as a Web developer you can just ask the browser to turn off unsafe things and not worry about designing a security policy. Besides ease of use, there is also the concern that a server-side filtering whitelist may be buggy, and if you apply the same whitelist on the client side as backup instead of doing something high level like disable scripting then you are less likely to benefit from defense in depth, since you may just replicate the bug. I should follow up with folks in the ruby-on-rails community to see how they view their sanitize API. The one person I asked had a positive opinion, but we should get a bigger sample size. For server-side sanitization, this kind of explicit API is pretty much the only thing you can do. I think updateWithSanitizedHTML has different use cases than @sandbox. I think the killer applications for @sandbox are advertisements and gadgets. In those cases, the developer wants most of the browser's functionality, but wants to turn off some dangerous stuff (like plug-ins). For updateWithSanitizedHTML, the killer application is something like blog comments, where you basically want text with some formatting tags (bold, italics, and maybe images depending on the forum). I can imagine use cases where allowing very open-ended but script-free content is desirable. For example, consider a hosted blog service that wants to let blog authors write nearly arbitrary HTML, but without allowing script. @sandbox would not be a good solution for that use case. In general it does not seem sensible to me that the choice of tag whitelisting vs high-level feature whitelisting is tied to the choice of embedding content directly vs. creating a frame. Is there a technical reason these two choices have to be tied? 2) It seems like this API loses one of the big benefits of sanitizing HTML in the browser implementation. Specifically, in theory it's safe to say allow everything except any construct that would result in script/ code running. You can't do that on the server side - blacklisting is not sound because you can't predict the capabilities of all browsers. But the browser can predict its own capabilities. Sandboxed iframes do allow for this. The benefit is that you know you're getting the right parsing. You're not going to be tripped up by img/src=javascript: and friends. It's true, this is a benefit. However, it seems like even if you whitelist tags, being able to say no script at a high level Also, this API is useful in cases where you don't have a server to help you sanitize your input. One example I saw recently was a GreaseMonkey script that wanted to add EXIF metadata to Flickr. Basically, the script grabbed the EXIF data from api.flickr.com and added it to the current page. Unfortunately, that meant I could use this GreaseMonkey script to XSS Flickr by adding HTML to my EXIF metadata. Sure, there are other ways of solving the problem (I asked the developer to build the DOM in memory and use innerText), but you want something simple for these cases. If the EXIF metadata is supposed to be text-only, it seems like updateWithSanitizedHTML would not be easier to use than innerText, or in any way superior. For cases where it is actually desirable to allow some markup, it's not clear to me that giving explicit whitelists of what is allowed is the simple choice. I think the benefits of filtering by tag/attribute/scheme for advanced experts are outweighed by these two disadvantages for basic use, compared to something simple like the original staticInnerHTML idea. Another possible alternative is to express how to sanitize at a higher level, using something similar to sandboxed iframe feature strings. If you think of @sandbox as being optimized for rich untrusted content and updateWithSanitizedHTML as being optimized for poor untrusted content, then you'll see that's what the API does already. The feature string Slashdot wants for its comments is (a b strong i em, href), but another message board might want something different. For example, 4chan might want (img, src alt). I don't think these require particularly advanced experts to understand. updateWithSanitizedHTML and @sandbox both provide features that the other does not for reasons that do not seem technically necessary. For example, updateWithSanitizedHTML could easily have an allow everything except script mode, and @sandbox could easily allow per- tag whitelisting. Then the choice would be between the resource cost of a frame, and the sandboxing features that it's
Re: [whatwg] Web Workers: SyntaxError exception?
On Tue, 3 Nov 2009, Simon Pieters wrote: Web Workers says If it failed to parse, then throw a SyntaxError exception and abort all these steps. Shouldn't that be SYNTAX_ERR exception? No, it's trying to emulate eval(). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [WebWorkers] About the delegation example
On Thu, 5 Nov 2009, David Bruant wrote: First of all, there is a typo error in this example. The main HTML page is a copy/paste of the first example (Worker example: One-core computation). Fixed. My point here is to ask for a new attribute for the navigator object that could describe the best number of workers in a delegation use case. It's not clear to me what the best number of workers is. It's not the number of CPUs, cores, or hardware threads, since it depends at least as much on the system load as on the system capabilities. And it varies over time, since the load is a function of time. In the delegation example, the number of workers chosen is an arbitrary 10. But, in a single-core processor, having only one worker will result in more or less the same running time, because at the end, each worker runs on the only core. That depends on the algorithm. If the algorithm uses a lot of data, then a single hardware thread might be able to run two workers in the same time as it runs one, with one worker waiting for data while the other runs code, and with the workers trading back and forth. Personally I would recommend basing the number of workers on the number of shards that the input data is split into, and then relying on the UA to avoid thrashing. I would expect UAs to notice when a script spawns a bazillion workers, and have the UA run them in a staggered fashion, so as to not starve the system resources. This is almost certainly needed anyway, to prevent pages from DOSing the user's system. On the other hand, on a 16-core processor (which doesn't exist yet, but is a realistic idea for the next couple of decades), the task could be executed faster with 16 workers. Well, again, that's not a given. If the algorithm is mostly network-bound or disk-bound, then it might well be that running multiple workers doesn't really gain you anything, and you might as well just do everything in one worker. It's hard to make generalisations about this kind of thing. Moreover, for a totally other purpose, this attribute could be used to make statistics on the spread of multicore processors like the statistics that are already done for operating system or screen resolution use. Do we really want to expose this? That seems like a minor privacy leak. On Fri, 6 Nov 2009, Drew Wilson wrote: Exposing information that's not reliable seems worse than not exposing it at all, and would encourage applications to grab all available resources (after all, that's the purpose of the API!). And the problem domains that would benefit from this information (arbitrarily parallelizable algorithms like ray tracing) seem to be few in number. Indeed. On Fri, 6 Nov 2009, Rob Ennals wrote: Maybe what we really want here is some kind of parallel map operation where we give the user agent an array and then say call this function on each element, using as many threads as you deem appropriate given the resources available. Each function call would logically execute in it's own worker context, but to keep semantics transparent, we might declare that such workers are not be allowed to send messages (other than a final result) and so could not tell how many parallel workers had actually been created. This is a reasonably good idea. It might make sense to do in v2 of Web Workers. I haven't added anything to Web Workers for now. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'