Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 02:28:49 +0200, Chris Double chris.dou...@double.co.nz wrote: On Thu, Aug 26, 2010 at 5:25 AM, Eric Carlson eric.carl...@apple.com wrote: FWIW, I agree with Silvia that a new file extension and MIME type make sense. I also think that a new file extension and MIME type is the way to go. Would Firefox / Safari support text/srt files in some undocumented fashion then or just simply not support those? The former would not really be an acceptable solution to me. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] [br] element should not be a line break
Ian Hickson: On Wed, 4 Aug 2010, Thomas Koetter wrote: What strikes me though is that according to the spec The br element represents a line break. A *line* break is presentational in nature. The break is structural, but restricting it to a certain presentation of that break lacks the desired separation of structure and presentation. Wouldn't it make more sense to consider the br element to be just a minor logical break inside a paragraph? Calling it a line break doesn't say how it is rendered. It's just a conceptual description. It presupposes the existance of lines, though. Lines are a very visual concept, although they can be applied to oral language, as in poems and songs (where ‘//’ is often an accepted representation for line breaks in transcripts). An oral line may span several literal lines and vice versa. Paragraphs (and breaks therein), of course, are also a concept of written language, as are sentences. However, I believe the underlying problem is simply that “line break” is (too) often used and understood as a synonym for “new line”, at least by non-native speakers. Speaking of breaks on line or paragraph level therefore makes more sense to me. (A minor logical break inside a paragraph is not generally represented by a line break, at least not in any typographic conventions I've seen; usually, in my experience, those are denoted either using ellipses, em-dashes, or parentheses.) That’s true for real paragraphs, but not for most “non-paragraphic” texts, e.g. addresses.
Re: [whatwg] Should events be paused on detached iframes?
On 25 August 2010 12:50, Boris Zbarsky bzbar...@mit.edu wrote: On 8/24/10 7:09 PM, Ben Lerner wrote: The history navigation analogy is a good one: pages presumably already have to handle the pageshow event to deal with being revived from the history, and the browser already needs to know how to fire that event. Why not reuse those mechanisms? A strawman claim: Nothing may be changing from the perspective of the iframe, but it certainly is changing from the perspective of the container or the user: detaching an iframe from a page is like navigating a browsing context away from a page, putting it into hibernation until it's reattached to an active document/browsing context. What subtle or important facet of the web am I missing that breaks this analogy? (It wouldn't surprise me if I missed something obvious, either... :) At least in the case of Gecko, there are at least the following things to keep in mind: 1) hibernating documents are very limited in what one can do with them (e.g. attempting to mutate the document in any way while hibernating will throw it away). 2) Documents have security policies applied to them based on the toplevel content window (or browser tab, if you prefer to think about it) they're associated with. Which means that allowing documents not immediately associated with any toplevel window, which would be the case right now in Gecko for an iframe not in a document, leads to security problems. This could be changed by redoing how the association is implemented, but there's some touchy code involved that we'd rather not get wrong. ;) Another reason to consider suspending detached iframes: suppose that in the chat window example below, the iframe wasn't just a same-origin place to store global state, but also had its own UI, with callbacks and event handlers and whatnot. If, during the interim while the iframe was being detached, adopted and reattached, that frame executed a timer that popped up a modal alert or prompt to the user, how would the user reasonably know where that alert came from? And what document(s?) should be paused while the alert is shown? And for that matter, how would the UA know where the alert came from, in terms of correctly parenting it? This ties back to item #2 above. Couldn't the iframe be kept alive, but remain associated with it's parent browsing context until (if) it was re-parented / inserted into a different document. (does this match what other elements in the DOM behave in terms of event handlers when they are detached?) That way, complex hibernate would be uneeded and it would be clear as to how to handle events, security, etc.
Re: [whatwg] Should events be paused on detached iframes?
On 8/26/10 3:23 AM, James May wrote: Couldn't the iframe be kept alive, but remain associated with it's parent browsing context until (if) it was re-parented / inserted into a different document. (does this match what other elements in the DOM behave in terms of event handlers when they are detached?) Elements behave fine. The question is what the Window should do. What should window.parent return in the iframe while detached? window.top? What should window.resizeTo do? That sort of thing. -Boris
Re: [whatwg] Should events be paused on detached iframes?
On 8/26/10 3:23 AM, James May wrote: Couldn't the iframe be kept alive, but remain associated with it's parent browsing context until (if) it was re-parented / inserted into a different document. (does this match what other elements in the DOM behave in terms of event handlers when they are detached?) Elements behave fine. The question is what the Window should do. What should window.parent return in the iframe while detached? window.top? What should window.resizeTo do? That sort of thing. -Boris
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
Silvia Pfeiffer wrote: You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? Or is #1 *impossible* due to the craziness of the legacy? (I thought any given .srt consumer only has a single code path and implemetation-wise there aren't already multiple .srt format even though doom9 spec-wise there are at least two.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] base64 entities
On 25.08.2010, at 23:46, Aryeh Gregor wrote: These cases can be secured without any new features in browsers (by escaping whitespace using numeric entities): function htmlescape($str) { return preg_replace('/[\s\']/e','.ord($0).;',$str); } That doesn't work in script for text/html, does it? Ah, indeed. Another tricky case came to my mind, which entities cannot secure (unless special magic is defined for the new entity): onclick=show('base64;') These are reasonable points. How many vulnerabilities would it actually prevent in practice if htmlspecialchars() were replaced with this everywhere? XSS is usually when you don't escape things at all, not when you escape them in a slightly wrong way. Easy escaping in script and style would be nice, though (or is there already some way to do that?). In PHP json_encode() works great for outputting data in JS (and can be configured to JS-escape HTML-unsafe chars too), but I feel like I'm the only person who knows about it :) -- regards, Kornel Lesiński
Re: [whatwg] base64 entities
Am 26.08.10 01:41, schrieb Adam Barth: On Wed, Aug 25, 2010 at 1:55 PM, Ian Hicksoni...@hixie.ch wrote: On Wed, 25 Aug 2010, Adam Barth wrote: HTML should support Base64-encoded entities to make it easier for authors to include untrusted content in their documents without risking XSS. Seems like a fine idea. Get browsers to implement it and I'll spec it. I've posted a patch for WebKit: https://bugs.webkit.org/show_bug.cgi?id=44641 Some subtleties: 1) Some base64 decoders tolerate newlines. We don't want to decode entities with newlines. 2) Decoding base64 results in binary data. We'll need to convert that data to characters in order to deal with it in the DOM. We use always use UTF8 for that transformation, regardless of the document's encoding. 3) Null characters are replaced with U+FFFD. 4) The empty base64 entity%; is consumed and is replaced with the empty string. 5) Invalid base64 is rejected and the entity is not decoded. Adam Is it necessary to consider compatibility issues here? In HTML4 this seems to have been valid code (- http://validator.w3.org/check): !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01//EN http://www.w3.org/TR/html4/strict.dtd; html head meta http-equiv=Content-type content=text/html; charset=US-ASCII titlebase64 entity test/title /head body pLook at these fine ASCII characters: %4oCT;/p /body /html Now it would be interpreted differently. Could this lead to old documents changing in meaning? Do we have to consider old documents that were not completely valid (e.g. lacked a doctype declaration)?
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 09:58:29 +0200, Henri Sivonen hsivo...@iki.fi wrote: Silvia Pfeiffer wrote: You misunderstand my intent. I am by no means suggesting that no WebSRT content is treated as SRT by any application. All I am asking for is a different file extension and a different mime type and possibly a magic identifier such that *authoring* applications (and authors) can clearly designate this to be a different format, in particular if they include new features. Then a *playback application* has the chance to identify them as a different format and provide a specific parser for it, instead of failing like Totem. They can also decide to extend their existing SRT parser to support both WebSRT and SRT. And I also have no issue with a user deciding to give a WebSRT file a go by renaming it to .srt. By keeping WebSRT and SRT as different formats we give the applications a choice to support either, or both in the same parser. If we don't, we force them to deal in a single parser with all the oddities of SRT formats as well as all the extra features and all the extensibility of WebSRT. Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. I do think it's a good idea to make the WebSRT handle existing SRT content as well as possible. The encoding issue is easy to side-step by just saying that that's a preprocessing step. Or is #1 *impossible* due to the craziness of the legacy? (I thought any given .srt consumer only has a single code path and implemetation-wise there aren't already multiple .srt format even though doom9 spec-wise there are at least two.) There are some issues with the current WebSRT parser that I've been meaning to send mail about, but by my impression is that it's not impossible to define a parser which works well enough to replace existing ones. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested by whatever potentially spec-incompliant means), why would they not use the same ingest code path for both? If the app isn't capable of supporting any feature that's permitted in WebSRT but not part of legacy SRT, how does failing at the point of finding out that this file claims to be WebSRT rather than SRT make things much better than failing at I found stuff that I can't handle/skip over in this SRT file? In particular, it seems like a wrong optimization to make it possible for apps that don't support any WebSRT features over legacy features to fail early than to make apps that support at least one WebSRT-introduced feature unify their processing of WebSRT and SRT by processing both WebSRT and SRT as one format where legacy SRT files just don't happen to use new features. To me, having different code paths for WebSRT and SRT is like IE adding a new Trident snapshot with every release whereas supporting SRT by treating it as WebSRT with no new features (if the app is supporting even one WebSRT-introduced feature!) is like what the other browsers are doing with HTML/CSS/DOM. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] base64 entities
On 25.08.2010 22:50, Adam Barth wrote: == Summary == ... Not convinced. There's already one way to escape these things, and this is supported in all UAs. I don't see how adding another mechanism will help those who can't use the first one properly. For instance, people unable to escape , and are likely also unable to get the UTF-8 conversion right. Best regards, Julian
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Wed, 25 Aug 2010 17:40:08 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: At this point, what is your recommendation? The following ideas have been on the table: * Change the file extension to something other than .srt. I don't have an opinion, browsers ignore the file extension anyway. Yes, I think we should definitely have a new file extension. I'll leave this to others to decide, but since browsers have no concept of file extensions, just using .srt will work. If the format is SRT-like it's likely at least some files will use .srt in practice. All SRT files in practice use the .srt extension - it is typically how these formats are identified by applications. Just because *nix ignores file extensions mostly for identifying file types doesn't mean that applications do. Again, I believe strongly that re-using the same file extension is the one biggest pain we can inflict on the community. As shown above, several popular (?) media players ignore or give little weight to the file extension. I don't think that's a fair sample - as I said, on Linux and on the command-line things are different. I have a GUI mplayer here and it reacts like VLC - doesn't let me open .wsrt files. The vast majority of applications on Windows and the Mac make their decision on whether they support files based on the file extension. That the file selection dialogs are filtered by file extensions doesn't mean that applications don't sniff the content. In fact, MPlayer, VLC and Totem will happily load and use an SRT file even if it is called foo.smi, even though SAMI is a completely incompatible format. In other words, they sniff the content as being SRT. The reason that they rely on sniffing is likely that many files use the wrong file extension (my OpenSubtitles batch have no extensions, so I have no statistics on this). Again, if we want to avoid exposing existing SRT parsers to WebSRT syntax, then the format needs to be more incompatible. File extensions will be changed, popular players rely on sniffing, some ignore leading garbage and also headers can simply be removed by naive conversion tools. Assuming we pick the same file extension and we now have a new application that only supports WebSRT parsing, we will make a large bunch of existing valid SRT files invalid - not only those that are not in UTF-8, but also those with font../font and u.../u. I do wonder if the text between the font start and end element and inside the u../u may even get removed because of lack of support for these. I've seen no application that removes everything between tags it doesn't recognize, the only things that I've seen happen is treating it as plain text or ignoring the tags much like a browser does with HTML. * Add a header to WebSRT to make it uniquely identifiable. The header would have to be mandatory and browsers would have to reject files that don't have it. Such files would be compatible with some existing software and break some, depending on how they sniff. We could also put metadata in such a header. Yes, I think we need to introduce a header. Maybe we can hide all the structure in what SRT recognizes as comments (i.e. start the lines as ;. But I believe we need some hints like the @profile to identify the type of the cues and the link to link to a style sheet, and we need metadata like the meta element of HTML headers. I had no idea that semicolon was used for comments in SRT, is this usage widespread? Does it work in most players? I thought it was, but maybe it was just introduced for WebSRT. It is not tested in Hixie's SRT research[2]. Can you take a quick look through your SRT file collection if there are any? I'm probably wrong about this seeing as it's not mentioned in the wiki page for SRT [3]. [2] http://wiki.whatwg.org/wiki/SRT_research [3] http://en.wikipedia.org/wiki/SubRip OK, I grepped the 1 files. Only 15 had any lines beginning with a semicolon, and by manual inspection it doesn't look like any of them are clearly intended as comments (it's hard to tell, all are in foreign languages). None of them were at the very beginning of the file. Ah, that actually makes for another incompatibility of WebSRT and SRT: such lines are regarded as comments in WebSRT when they probably aren't in SRT. I can't find anything about this when searching for comment and semicolon in the spec, are you sure you're not thinking of some other format than WebSRT? It seems increasingly that the only thing that WebSRT and SRT still have in common is the -- character sequence. As a friend of mine in a11y recently said: I was hoping to never have to stare at -- ever again... We could indeed go all the way and define an much more different format, though I don't think it will create implementations as quickly as a SRT-based but changed format. I would prefer if we follow one of two paths: 1. Let
Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements
On Thu, 26 Aug 2010 11:52:26 +0200, Henri Sivonen hsivo...@iki.fi wrote: Why wouldn't it always be a superior solution for all parties to do the following: 1) Make sure WebSRT never requires processing that'd require rendering a substantial body of legacy .srt content in a broken way. (This would require supporting non-UTF-8 encodings by sniffing as well as supporting font and u, which would happen for free if my innerHTML proposal were adopted.) 2) Make playback software that supports WebSRT only have a WebSRT code path and use that code path for legacy .srt content as well. ? Specifically, if #1 is done, why would any pragmatic developer not want to do #2 if they are supporting WebSRT in their software? Why would anyone want to have a code path that turns off new WebSRT features if they have a code path that supports WebSRT features? I think many media player developers would be hesitant to include a full HTML parser just for parsing (Web)SRT, especially since they'd also need a layout engine to get anything more than they would get from a simpler parser. If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested by whatever potentially spec-incompliant means), why would they not use the same ingest code path for both? I don't they should or would, I'm just saying that they'd probably be hesitant to use an HTML parser in that single code path, as there's very little benefit for them. If the app isn't capable of supporting any feature that's permitted in WebSRT but not part of legacy SRT, how does failing at the point of finding out that this file claims to be WebSRT rather than SRT make things much better than failing at I found stuff that I can't handle/skip over in this SRT file? In particular, it seems like a wrong optimization to make it possible for apps that don't support any WebSRT features over legacy features to fail early than to make apps that support at least one WebSRT-introduced feature unify their processing of WebSRT and SRT by processing both WebSRT and SRT as one format where legacy SRT files just don't happen to use new features. To me, having different code paths for WebSRT and SRT is like IE adding a new Trident snapshot with every release whereas supporting SRT by treating it as WebSRT with no new features (if the app is supporting even one WebSRT-introduced feature!) is like what the other browsers are doing with HTML/CSS/DOM. Is this in reply to something other than what you quoted? In any case, I agree. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Built-in image sprite support in HTML5
On Wed, Aug 25, 2010 at 7:00 PM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: It would, however, be good to have an indication where HTML would like to see it going. Would it be better for a media fragment URI for images such as http://example.com/picture.png#xywh=160,120,320,240 to display the full image with the rectangle somehow highlighted (as is the case with fragment URIs to HTML pages), or would it be better to actually just display the specified region and hide the rest of the image (i.e. create a sprite)? What makes the most sense for images? The CSS Image Values Module ( http://dev.w3.org/csswg/css3-images/#url ) is currently recommending Media Fragments as a way to sprite out a portion of a resource. We have a note that we're expecting a spec to reference at some point. ~TJ
Re: [whatwg] Built-in image sprite support in HTML5
On Thu, Aug 26, 2010 at 9:01 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: On Wed, Aug 25, 2010 at 7:00 PM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: It would, however, be good to have an indication where HTML would like to see it going. Would it be better for a media fragment URI for images such as http://example.com/picture.png#xywh=160,120,320,240 to display the full image with the rectangle somehow highlighted (as is the case with fragment URIs to HTML pages), or would it be better to actually just display the specified region and hide the rest of the image (i.e. create a sprite)? What makes the most sense for images? The CSS Image Values Module ( http://dev.w3.org/csswg/css3-images/#url ) is currently recommending Media Fragments as a way to sprite out a portion of a resource. We have a note that we're expecting a spec to reference at some point. ~TJ Oh, wow, that's good to know. Thanks! Silvia.
Re: [whatwg] Should events be paused on detached iframes?
On 26 August 2010 17:27, Boris Zbarsky bzbar...@mit.edu wrote: On 8/26/10 3:23 AM, James May wrote: Couldn't the iframe be kept alive, but remain associated with it's parent browsing context until (if) it was re-parented / inserted into a different document. (does this match what other elements in the DOM behave in terms of event handlers when they are detached?) Elements behave fine. The question is what the Window should do. What should window.parent return in the iframe while detached? window.top? What should window.resizeTo do? That sort of thing. -Boris I thought I just suggested that? Everything works normally (as if it was still attached) until it is reattached, when the situation is re-evaluated. In terms of resource consumption, I don't see how this would be any different to any other kind of leak that web content can trigger. (I think someone mentioned that iframes can be GC'd normally)
Re: [whatwg] INCLUDE and links with @rel=embed
On Thu, 5 Aug 2010, Bjartur Thorlacius wrote: On Tue, 18 May 2010, bjartur wrote: First of all I think we should use a rel=embed href=uri-ref instead of source. What problem would this solve? It would tell UAs that don't implement HTML 5 that the value of @href is an URI. Then it can provide means for the user to retrieve the identified resource (and do something useful with it). Surely the kind of URL is already fully given by the scheme, making this rather moot. For authors it would unnecessiate constructs such as (excerpt from spec): video controls src=http://video.example.com/vids/315981; a href=http://video.example.com/vids/315981;View video/a. /video In fact, having the ability to follow this link is useful even though my browser supports video. But that's an UI issue. I don't understand how it would affect this. On Wed, 19 May 2010, Bjartur Thorlacius wrote: Is the existing syntax backwards compatible? When using A, you get a nice link as fallback content automagically, not requiring any special workarounds by the content author. AFAICT you don't even get that when using a browser that doesn't support audio and video. Indeed, with those you have to provide the fallback content (which could e.g. be flash) as a descendant of the audio/video element. As a user of a browser that doesn't fully support video I'd prefer getting a hyperlink to the resource to a Flash program. Just sayin'. Most authors would rather the user never knew there was a difference and just get the video, it seems. If you're saying that we should also support other timed-based formats in the future even if they are not video, e.g. if you are saying we should support formats like SMIL, then there's no reason you can't do that with video itself. video really is just an API to time-based visual data, it doesn't have to be a sequence of bitmaps. Oh, the following quote confused me. The video element is a media element whose media data is ostensibly video data I picked the word ostensibly on purpose for that sentence. :-) I'm not just talking about SMIL. I'm talking about using a secondary feature of media elements (the ability to link to multiple alternative resources) even if the main feature (the API) is irrelevant. video source src=f.utf8 charset=utf8 source src=f.latin1 charset=latin1 /video video source src=img.png type=image/png source src=img.svg type=image/svg+xml /video I don't need to know the duration of an unanimated PNG. Ah, yeah, that isn't supported. You can just use object for the image case: object data=img.png type=image/png object data=img.svg type=image/svg+xml /object /object In the character encoding case, everyone supports UTF-8, so just use that. On Wed, 19 May 2010, bjartur wrote: Yeah, maybe my crazy idealism and tendency to reuse existing things don't mix up in this case. The main purpose of video and audio is to create a scripting interface to online video. But they also add new linking capabilities which should be available to any content whatsoever. I don't really see how. In what sense do they add new linking capabilities? In the sense of multiple alternative (media) resources. This could possibly be done with object but its fallback mechanism seems inferior. The video one is very specific to codecs and so forth; I don't think it would make sense to generalise it. object already handled it fine. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] base64 entities
On 2010-08-26 13:58:24, Julian Reschke wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Totally agree. If a web author isn't sufficiently experienced to remember to call an HTML-encoding function, there is no reason to believe they'll think to call a base64-encoding function either. The proposal adds more parsing complexity (and XML incompatibility) for no obvious gain. However, I'm all for making standardised HTML-encoder/decoder and base64-encoder/decoder functions available at a `window` or ECMAScript language level. `atob`/`btoa` do the job, but they're byte encoders not characters; they expect 'binary' strings where each charCode is the ordinal value of a byte. That is to say, they `btoa` encodes the input string using genuine-ISO-8859-1 and not UTF-8. This is necessary for a general-purpose base64 implementation (otherwise many base64 strings would not be decodable at all), but may be unexpected. Is it worth providing a UTF-8-based variant or argument? Otherwise users would have to convert from UTF-8-misdecoded-as-ISO-8859-1 strings manually. (That's not difficult, using `decodeURIComponent(escape(s))`, but this trick isn't obvious or well-known.) -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/
Re: [whatwg] IDL attribute reflecting enumerated attributes not limited to only know values
On Fri, 6 Aug 2010, Aryeh Gregor wrote: On Fri, Aug 6, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote: I'm happy to make more of them limited, especially new attributes or ones that were already that way, but I'd rather not change the default as that can have unexpected effects (e.g. some of the attributes are definitely not so limited, and I don't recall which that might be). The enumerated attributes in the spec right now that are not limited to only known values are, by my count: * audio.preload, video.preload (note that at least WebKit appears to treat these as limited to known values already) * command.type * form.autocomplete, input.autocomplete * track.kind These are all changed now. * marquee.direction What do browsers do for this one? * marquee.trueSpeed This is now a boolean attribute. * meta.httpEquiv I'm pretty sure browsers don't treat this as limited to only known values. * th.scope * textarea.wrap Browsers don't seem to limit these. On Sat, 7 Aug 2010, Mounir Lamouri wrote: On 08/06/2010 09:01 PM, Ian Hickson wrote: - input.autocomplete: at the moment, it is returning the content but it could return the resulting autocompletion state which is maybe a bit more than just being limited to only known values but still in the same spirit. I haven't changed this; what's the use case for knowing the actual state? Theoretically speaking, I think input.autocomplete should return the current autocompletion state because that would follow the actual idea of enumerated attributes limited to only known values. There's a big difference between reflecting the state of the attribute (what reflecting enumerated attributes does) and reflecting the state of the actual feature (which is rare in the DOM). Indeed, these kind of enumerated attributes doesn't return the content value but the value associated with the current state and in that case the 'state' is the autocompletion state. No, the attribute's state is based on its value and is distinct from the actual autocompletion state. Practical speaking, autocomplete is mostly used in writing (authors want to force/disable autocomplete) and we can assume that a script reading this value is going to check if the element have autocompletion. Having input.autocomplete returning this state may prevent the authors to repeat the algorithm thus preventing errors and making further changes in specification easier (and transparent). I don't follow. By the way, why autocomplete IDL attributes have been introduced in the specifications? Completeness. On Tue, 17 Aug 2010, Aryeh Gregor wrote: Test case: !doctype html script var el = document.createElement(form); el.setAttribute(method, get); alert(el.method); el.setAttribute(method, GET); alert(el.method); /script Spec: If a reflecting IDL attribute is a DOMString whose content attribute is an enumerated attribute, and the IDL attribute is limited to only known values, then, on getting, the IDL attribute must return the conforming value associated with the state the attribute is in (in its canonical case) . . . http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#reflecting-content-attributes-in-idl-attributes This says it should echo GET twice. Four out of the five browsers I tested in (Firefox 4 beta, Chrome dev, Safari 5, Opera 10.60) echo get and then GET. IE8 and IE9PP4 echo get twice. I think the spec and IE are right here -- you should be able to test form.method == GET (or == get, whichever) and have it work whenever it's in the GET state. However, since 4/5 of browsers disagree, I'm asking if anyone thinks the spec should be changed, before I file browser bugs. The real question is, would implementing the spec lead to compatibility issues? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Should events be paused on detached iframes?
On 8/26/10 11:58 AM, James May wrote: I thought I just suggested that? Everything works normally (as if it was still attached) until it is reattached, when the situation is re-evaluated. That could fall afoul of security checks that assume that an iframe with a non-null parent is in fact a subframe and that it's owner element is in the DOM. I know Gecko certainly has such internally. Again, nothing insurmountable, but there's a bunch of code in Gecko that makes assumptions about when windows can and can't exist that would need auditing. I can't speak to the web compat aspects. In terms of resource consumption, I don't see how this would be any different to any other kind of leak that web content can trigger. I don't think that's an issue, though this does raise the question of when it's OK to gc the iframe. (I think someone mentioned that iframes can be GC'd normally) Can they, with your proposal? It seems that with your proposal if you remove an iframe from the DOM and then forget about it then as long as there's any network activity in that iframe or anything else which might potentially trigger script it cannot be gced. This seems like it would make it very easy to leak document after document... -Boris
[whatwg] Clarification on @srcdoc referrer and base URL
What should the baseURL and referrer be for a @srcdoc nested browsing context? If I follow the base URL behavior for about:blank it will just be inherited from the creator document. That seems like the right thing to do, so I think section 2.5.1 should be modified to read: If fallback base url is about:blank or about:srcdoc, and the Document's browsing context has a creator browsing context, then let fallback base url be the document base URL of the creator Document instead. The referrer seems trickier. I couldn't find the about:blank referrer behavior specified anywhere, and in my testing it does not inherit the creator document's referrer. However, it seems to me that maybe about:srcdoc should, even if about:blank does not. Regards, Justin
Re: [whatwg] base64 entities
On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschke julian.resc...@gmx.de wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Adam gave two examples of cases where htmlspecialchars() is insufficient, even if authors do use it. This proposal is completely general and will work anywhere, even in script. Is automated general escaping even possible right now in script for text/html?
[whatwg] Proposal for a modal element
Good afternoon, I am wondering if public discussion has been had over the concept of introducing a dialog element into html5. Normally a modal dialog is created using scripting and CSS to restrict focus and activity within the modal segment of the DOM and to style the modal section of the DOM to appear as though it is a separate region floating above the remainder of the document. A modal element type could indicate to UAs that a segment of the DOM is to be treated as active, while the remainder of the DOM is to be inactive. Focus could be automatically set to the first natively focusable element within the modal segment of the DOM, or could be explicitly set through scripting. UAs could provide a default style for modals, as they do for other elements, but the developer would normally need to adjust the style using CSS for proper sizing and positioning. Thanks, Everett Zufelt http://zufelt.ca Follow me on Twitter http://twitter.com/ezufelt View my LinkedIn Profile http://www.linkedin.com/in/ezufelt
Re: [whatwg] base64 entities
On 26.08.2010 22:10, Aryeh Gregor wrote: On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Adam gave two examples of cases where htmlspecialchars() is insufficient, even if authors do use it. This proposal is completely general and will work anywhere, even inscript. Is automated general escaping even possible right now inscript for text/html? I have to admit that I'm not sure what's special about script here. Are you saying that it's insufficient to escape all characters that have a special meaning there? Server-wise, how is introducing a new escape mechanism any better than fixing the support code for the existing mechanism? Best regards, Julian
Re: [whatwg] base64 entities
On 8/26/10 4:10 PM, Aryeh Gregor wrote: On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Adam gave two examples of cases where htmlspecialchars() is insufficient, even if authors do use it. This proposal is completely general and will work anywhere, even inscript. Sorta. It'll let you put the data in script, but it won't verify that the data doesn't change the meaning of the script, obviously, or inject script of its own to run. Is automated general escaping even possible right now inscript for text/html? Defined how? -Boris
Re: [whatwg] base64 entities
On 26.08.2010 22:10, Aryeh Gregor wrote: On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Adam gave two examples of cases where htmlspecialchars() is insufficient, even if authors do use it. This proposal is completely general and will work anywhere, even inscript. Is automated general escaping even possible right now inscript for text/html? OK, sorry for my multiple posts. I now get the point about the additional problems in script, but I fail to see how the proposal addresses this, unless expanding these entities is suppose to happen *after* parsing the script. Best regards, Julian
Re: [whatwg] base64 entities
On Thu, Aug 26, 2010 at 4:20 PM, Julian Reschke julian.resc...@gmx.de wrote: I have to admit that I'm not sure what's special about script here. Are you saying that it's insufficient to escape all characters that have a special meaning there? data:text/html,!doctype html scriptalert(amp;);/script alerts amp;, not . So generally, you just don't escape stuff in script, but I don't know of any general-purpose way to have /string in a string literal (or anywhere else), other than splitting it up like /scr + ipt. On Thu, Aug 26, 2010 at 4:25 PM, Boris Zbarsky bzbar...@mit.edu wrote: Sorta. It'll let you put the data in script, but it won't verify that the data doesn't change the meaning of the script, obviously, or inject script of its own to run. Hmm. Okay, then I don't get how this helps in Adam's second example: script elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.'; /script If it doesn't help there, then I don't see any use-cases, since the first example is trivially solvable by just using quotes. Is automated general escaping even possible right now inscript for text/html? Defined how? Suppose I have some arbitrary blob of trusted JavaScript, and I want to output it as an inline script in text/html. How do I escape it so that it executes as intended -- in particular, given that it might contain the string /script in string literals, comments, and so on? In most contexts, you could just replace '' = 'lt;', but that doesn't work in inline script. (Right? I admit I'm mostly cargo-culting this, and have no idea how text/html parsing works at all. I have fond dreams of an HTML serialization that's actually comprehensible to authors but has reasonable error handling . . .)
Re: [whatwg] Proposal for a modal element
Hi E.J., I've actually been working with some other people on the Chromium team for what we were calling a topmost window that could be used for modal dialogs. After some feedback, it's been suggested that we try to turn this into a more generic dialog element. I haven't yet incorporated that feature into the writeup, but I'll send you a link off-list. I hope to update the doc and post it to the list for feedback very soon. -- Dirk On Thu, Aug 26, 2010 at 1:12 PM, E.J. Zufelt li...@zufelt.ca wrote: Good afternoon, I am wondering if public discussion has been had over the concept of introducing a dialog element into html5. Normally a modal dialog is created using scripting and CSS to restrict focus and activity within the modal segment of the DOM and to style the modal section of the DOM to appear as though it is a separate region floating above the remainder of the document. A modal element type could indicate to UAs that a segment of the DOM is to be treated as active, while the remainder of the DOM is to be inactive. Focus could be automatically set to the first natively focusable element within the modal segment of the DOM, or could be explicitly set through scripting. UAs could provide a default style for modals, as they do for other elements, but the developer would normally need to adjust the style using CSS for proper sizing and positioning. Thanks, Everett Zufelt http://zufelt.ca Follow me on Twitter http://twitter.com/ezufelt View my LinkedIn Profile http://www.linkedin.com/in/ezufelt
Re: [whatwg] IDL attribute reflecting enumerated attributes not limited to only know values
On Thu, Aug 26, 2010 at 2:00 PM, Ian Hickson i...@hixie.ch wrote: * marquee.direction What do browsers do for this one? Seems like they don't limit it to known values, at least Firefox/Opera/Chrome. * meta.httpEquiv I'm pretty sure browsers don't treat this as limited to only known values. * th.scope * textarea.wrap Browsers don't seem to limit these. If we could change all these to limited without compat problems, though, it would be a nice little simplification -- enumerated attributes would all have the same reflection behavior. The real question is, would implementing the spec lead to compatibility issues? As Mounir Lamouri pointed out, Firefox nightlies already mostly implement the spec here, so I guess we'll find out. :) The spec is considerably nicer than preexisting behavior.
Re: [whatwg] base64 entities
On Thu, 26 Aug 2010 21:56:12 +0100, Aryeh Gregor simetrical+...@gmail.com wrote: Suppose I have some arbitrary blob of trusted JavaScript, and I want to output it as an inline script in text/html. How do I escape it so that it executes as intended -- in particular, given that it might contain the string /script in string literals, comments, and so on? In most contexts, you could just replace '' = 'lt;', but that doesn't work in inline script. Inside strings you replace / with \/ (\/ is valid escape sequence for /), outside strings you'd need to add space between / (a corner case x /regexliteral/). You might also use script src=data:. -- regards, Kornel
Re: [whatwg] base64 entities
On Wed, 25 Aug 2010 22:52:42 +0100, Kornel Lesiński kor...@geekhood.net wrote: script elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.'; /script These cases can be secured without any new features in browsers (by escaping whitespace using numeric entities): I realized I was wrong about this one. It won't prevent script injection in JS strings (in places where entities are decoded, including script in XML), because entity will be changed to plain text before JavaScript is tokenized. For this reason, base64 entities won't solve this problem either, unless they're specifically defined as JavaScript construct, not only HTML construct (and I think such mix of parser would be bad). If parser decoded such entities in script (like XHTML does): foo = '%JztldmlsKCk7Jw==;' then decoded string passed to JS parser would look like: innerHTML = '';evil();'' which defeats purpose of the encoding. OTOH if HTML parser didn't decode these entities in script (which is current text/html behavior), then JS would get undecoded string (i.e. foo.charAt(0) == ''). -- regards, Kornel
Re: [whatwg] base64 entities
On Thu, 26 Aug 2010 22:30:00 +0200, Julian Reschke julian.resc...@gmx.de wrote: I now get the point about the additional problems in script, but I fail to see how the proposal addresses this, unless expanding these entities is suppose to happen *after* parsing the script. If you have ele.innerHTML = '%;' inside script it would be expanded the moment innerHTML is invoked (inside script entities are not expanded) and thus be safe from /script injection and such. So yes, it happens after. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] base64 entities
On 08/26/2010 10:56 PM, Aryeh Gregor wrote: I don't know of any general-purpose way to have /string in a string literal (or anywhere else), The simple approach is to use JavaScript string literal escapes: `\x3C/script`. A JSON encoder may offer the option to avoid HTML-special characters in string literals, encoded as escapes like `\u003C`. This allows literals to be included in a JavaScript block that may or may not be in a CDATA element, so may or may not need HTML-encoding. other than splitting it up like /scr + ipt. This is a common but wrong idiom that should be avoided; it won't validate because in HTML4 the `/` sequence itself (ETAGO) ends a script block. elmt.innerHTML = 'Hi there?php echo htmlspecialchars($name) ?.'; Is a common error (security hole). Encoding text for use in a JavaScript string literal (`\`-escaping) is an entirely different proposition to encoding text for use in HTML (entity/character references). PHP offers no JS-string-literal-escape function. `addslashes` is very close, but won't handle some cases with non-ASCII characters correctly. Better to use `json_encode` to transfer the string, then write as text: elmt.textContent = ?php echo json_encode('Hi there, '+$name, JSON_HEX_TAG); ? (assuming innerText or Text Node backup for IE/older browsers.) A 'magic' escaping feature that will somehow guess what sort of encoding the author means is wishful (impossible) thinking. A base64-encoded entity reference could do nothing for JavaScript, CSS or other nested string context. -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/
Re: [whatwg] base64 entities
2010/8/26 Kornel Lesiński kor...@geekhood.net: On Wed, 25 Aug 2010 22:52:42 +0100, Kornel Lesiński kor...@geekhood.net wrote: script elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.'; /script These cases can be secured without any new features in browsers (by escaping whitespace using numeric entities): I realized I was wrong about this one. It won't prevent script injection in JS strings (in places where entities are decoded, including script in XML), because entity will be changed to plain text before JavaScript is tokenized. Indeed. This is not a feature for XML. XML won't decode the entity at all. In HTML, script doesn't decode entities, so the pattern is safe. Adam
Re: [whatwg] base64 entities
On 26.08.2010, at 23:28, Adam Barth wrote: script elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.'; /script These cases can be secured without any new features in browsers (by escaping whitespace using numeric entities): I realized I was wrong about this one. It won't prevent script injection in JS strings (in places where entities are decoded, including script in XML), because entity will be changed to plain text before JavaScript is tokenized. Indeed. This is not a feature for XML. XML won't decode the entity at all. In HTML, script doesn't decode entities, so the pattern is safe. Yes, but in that case JS would have to decode the entity on its own. It wouldn't be strictly HTML feature, but also change interpretation of JS string literals. And what if you use this entity outside JS string? In regex literal? What about onclick=show('%base64;')? Should this be left insecure, or should HTML parser have special entity handling for on* attributes? And then what's the meaning of onclick=show('amp;%base64;')? -- regards, Kornel
Re: [whatwg] base64 entities
On Wed, Aug 25, 2010 at 6:37 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 8/25/10 7:41 PM, Adam Barth wrote: 2) Decoding base64 results in binary data. We'll need to convert that data to characters in order to deal with it in the DOM. We use always use UTF8 for that transformation, regardless of the document's encoding. Note that this issue means that using atob or btoa for dealing with this is a huge pain if non-ASCII chars are involved, since those take and return byte arrays masquerading as JS strings, not actual Unicode strings. I'm slightly confused how that works. How do you represent arbitrary binary data as characters? Another option is to provide a base64 encoder/decoder that uses UTF8 to encode/decode the binary. On Thu, Aug 26, 2010 at 1:38 AM, Martin Janecke whatwg@kaor.in wrote: Is it necessary to consider compatibility issues here? In HTML4 this seems to have been valid code (- http://validator.w3.org/check): It's always necessary to consider compatibility. Perhaps one of our friends with the ability to grep the web would be kind enough to tell us how common % followed by base64 characters followed by ; is. On Thu, Aug 26, 2010 at 2:58 AM, Julian Reschke julian.resc...@gmx.de wrote: Not convinced. There's already one way to escape these things, and this is supported in all UAs. Which way is that? I don't see how adding another mechanism will help those who can't use the first one properly. For instance, people unable to escape , and are likely also unable to get the UTF-8 conversion right. Escaping just those character is insufficient. The appeal of this approach is that authors don't need the right blacklist of dangerous characters. By the way, there are already folks doing something similar manually now. They send the untrusted bytes as base64 and decode them using JavaScript. On Thu, Aug 26, 2010 at 1:25 PM, Boris Zbarsky bzbar...@mit.edu wrote: Sorta. It'll let you put the data in script, but it won't verify that the data doesn't change the meaning of the script, obviously, or inject script of its own to run. Because script does not decode entities in HTML, the attacker will be limited to what he or she can do with alphanumeric characters, +, /, and trailing =. Of course, if the entity appears in a string context (as is pretty common), the attacker won't be able to break out of the string context, even by include /script in the attack string (which is a common vulnerability in hand-rolled escaping schemes). On Thu, Aug 26, 2010 at 1:30 PM, Julian Reschke julian.resc...@gmx.de wrote: I now get the point about the additional problems in script, but I fail to see how the proposal addresses this, unless expanding these entities is suppose to happen *after* parsing the script. Yes. That's precisely what happens. Kind regards, Adam
Re: [whatwg] base64 entities
On 8/26/10 6:45 PM, Adam Barth wrote: Note that this issue means that using atob or btoa for dealing with this is a huge pain if non-ASCII chars are involved, since those take and return byte arrays masquerading as JS strings, not actual Unicode strings. I'm slightly confused how that works. How do you represent arbitrary binary data as characters? You mean how do atob/btoa take their binary data in JS-land? You take your byte array, and convert it to a sequence of two-byte units by setting the high byte to 0. This sequence of two-byte units is a JS string. Another option is to provide a base64 encoder/decoder that uses UTF8 to encode/decode the binary. Not sure what the exact proposal here is. Becausescript does not decode entities in HTML, the attacker will be limited to what he or she can do with alphanumeric characters OK. I had misunderstood what you were proposing for script here. The point is that inside script this base64 thing will only be useful for setting innerHTML, right? -Boris
Re: [whatwg] Input URL State and Files object
On 8/25/2010 2:02 PM, Ian Hickson wrote: On Mon, 2 Aug 2010, Charles Pritchard wrote: [ UAs can useinput type=file to let the user enter remote URLs ] When a user through selection, click+drag or manual entry of a URL should the browser still submit an Origin request header? It seems that CORS doesn't come into effect here -- but at the same time, it'd be handy for logging purposes and added security. I don't think there'd be an origin, but that's rather up to the user agent. (In this case it's acting on behalf of the user, not the page, so I don't think it makes sense to give the page's origin.) Sounds like an implementer would not include a Referer header, either. ... Continuing on with tweaking URLs to work with with the File API: Chrome has gone ahead with their setData proposal, enhancing the event.dataTransfer object so that users may drag a file from within the browser onto their desktop. The extension uses setData with a key of DownloadURL and a value including a mime type, file descriptor and URI. I'd like this interface to work within ondrop; if getData(DownloadURL) is set, then a FileList would be returned in event.dataTransfer.files, much like it is when users drag files from their desktop into the browser. This would of course require Origin checks; whereas dragging onto the desktop does not require an Origin check. ... Here's the current example of setData(DownloadURL) and my comments. https://code.google.com/p/html5rocks/issues/detail?id=136 var dragElem = document.getElementById(ID_Element_to_be_dragged); dragElem.addEventListener( dragstart, function(event) { event.dataTransfer.setData( DownloadURL, application/pdf:sample.pdf:http://example.com/example-download-data;); }, false );
Re: [whatwg] Should events be paused on detached iframes?
On 27 August 2010 05:02, Boris Zbarsky bzbar...@mit.edu wrote: On 8/26/10 11:58 AM, James May wrote: I thought I just suggested that? Everything works normally (as if it was still attached) until it is reattached, when the situation is re-evaluated. That could fall afoul of security checks that assume that an iframe with a non-null parent is in fact a subframe and that it's owner element is in the DOM. I know Gecko certainly has such internally. Again, nothing insurmountable, but there's a bunch of code in Gecko that makes assumptions about when windows can and can't exist that would need auditing. I can't speak to the web compat aspects. Could the iframe be hoisted to the top level of its parent browsing context? In terms of resource consumption, I don't see how this would be any different to any other kind of leak that web content can trigger. I don't think that's an issue, though this does raise the question of when it's OK to gc the iframe. When no references remain in either the DOM or script? if an iframehttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-iframe-elementis removedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#remove-an-element-from-a-documentfrom a Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#documentand is then subsequently garbage collected, this will likely mean (in the absence of other references) that the child browsing contexthttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#child-browsing-context's WindowProxyhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#windowproxyobject will become eligble for garbage collection, which will then lead to that browsing contexthttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#browsing-contextbeing discardedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#a-browsing-context-is-discarded, which will then lead to its Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#documentbeing discardedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#discard-a-documentalso. This happens without notice to any scripts running in that Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#document; for example, no unload events are fired (the unload a documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#unload-a-document steps are not run). Although I'm not sure why this is different from the regular steps. ( http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#garbage-collection-and-browsing-contexts ) (I think someone mentioned that iframes can be GC'd normally) Can they, with your proposal? It seems that with your proposal if you remove an iframe from the DOM and then forget about it then as long as there's any network activity in that iframe or anything else which might potentially trigger script it cannot be gced. This seems like it would make it very easy to leak document after document... So running scripts and network activity are GC roots? -- James
Re: [whatwg] Input URL State and Files object
On Thu, Aug 26, 2010 at 5:24 PM, Charles Pritchard ch...@jumis.com wrote: On 8/25/2010 2:02 PM, Ian Hickson wrote: On Mon, 2 Aug 2010, Charles Pritchard wrote: [ UAs can useinput type=file to let the user enter remote URLs ] When a user through selection, click+drag or manual entry of a URL should the browser still submit an Origin request header? It seems that CORS doesn't come into effect here -- but at the same time, it'd be handy for logging purposes and added security. I don't think there'd be an origin, but that's rather up to the user agent. (In this case it's acting on behalf of the user, not the page, so I don't think it makes sense to give the page's origin.) Sounds like an implementer would not include a Referer header, either. ... Continuing on with tweaking URLs to work with with the File API: Chrome has gone ahead with their setData proposal, enhancing the event.dataTransfer object so that users may drag a file from within the browser onto their desktop. The extension uses setData with a key of DownloadURL and a value including a mime type, file descriptor and URI. I'd like this interface to work within ondrop; if getData(DownloadURL) is set, then a FileList would be returned in event.dataTransfer.files, much like it is when users drag files from their desktop into the browser. This would of course require Origin checks; whereas dragging onto the desktop does not require an Origin check. I would think that a same-origin check should always be performed. In firefox, the save-as dialog always displays the website you are downloading from. However with drag'n'drop no dialog will be shown and the user will presumably think he/she is downloading from the site where the drag started. Or are browsers planning on displaying the save-as dialog? / Jonas
Re: [whatwg] Should events be paused on detached iframes?
On 8/26/10 10:33 PM, James May wrote: Could the iframe be hoisted to the top level of its parent browsing context? Not sure what you mean. When no references remain in either the DOM or script? if an |iframe http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-iframe-element| is removed http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#remove-an-element-from-a-document from a |Document http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#document| and is then subsequently garbage collected It can't become garbage collected while the window inside it isn't, since the window inside it references the iframe (via frameElement). this will likely mean (in the absence of other references) that the child browsing context http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#child-browsing-context's |WindowProxy http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#windowproxy| object will become eligble for garbage collection I don't think it's reasonable to gc the iframe element while leaving the window inside alive due to it being referenced. That introduces races where frameElement could suddenly become null at some point (possibly between two lines of the same script, or even partway through some operation; for example GC can happen, even multiple times, during a property get or set). That would be pretty broken behavior. Although I'm not sure why this is different from the regular steps. Presumably the only different thing is the lack of an unload event. Can they, with your proposal? It seems that with your proposal if you remove an iframe from the DOM and then forget about it then as long as there's any network activity in that iframe or anything else which might potentially trigger script it cannot be gced. This seems like it would make it very easy to leak document after document... So running scripts and network activity are GC roots? Not running scripts. Anything that might potentially run a script in the future. You can think of it as gc roots, sure, and you can also claim that gc'ed systems never leak memory. But is either necessarily useful? The upshot is that random things that the web developer knows nothing about and doesn't care about can prevent the memory from being deallocated effectively forever from the web developer's point of view. And worse yet, there's no obvious recourse (as in, no way to make sure the thing is garbage collected). Any reasonable person would call that a memory leak in the browser, not in the site. Just like a JS impl that never GCes until you navigate away from the page should be considered to have a memory leak. -Boris
Re: [whatwg] Input URL State and Files object
On 8/26/2010 7:53 PM, Jonas Sicking wrote: On Thu, Aug 26, 2010 at 5:24 PM, Charles Pritchardch...@jumis.com wrote: Chrome has gone ahead with their setData proposal, enhancing the event.dataTransfer object so that users may drag a file from within the browser onto their desktop. I would think that a same-origin check should always be performed. In firefox, the save-as dialog always displays the website you are downloading from. However with drag'n'drop no dialog will be shown and the user will presumably think he/she is downloading from the site where the drag started. Or are browsers planning on displaying the save-as dialog? I think that save-as dialogs are implementation-specific. For example, OS X-based prompts happen when you first open a file, not when downloading. The HTML 5 UI/UA permissions are built upon the idea that drag/drop confers a similar permissibility to right click + context menu actions.