Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Anne van Kesteren
On Thu, 26 Aug 2010 02:28:49 +0200, Chris Double  
chris.dou...@double.co.nz wrote:
On Thu, Aug 26, 2010 at 5:25 AM, Eric Carlson eric.carl...@apple.com  
wrote:

FWIW, I agree with Silvia that a new file extension and MIME type make
sense.


I also think that a new file extension and MIME type is the way to go.


Would Firefox / Safari support text/srt files in some undocumented fashion  
then or just simply not support those? The former would not really be an  
acceptable solution to me.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] [br] element should not be a line break

2010-08-26 Thread Christoph Päper
Ian Hickson:
 On Wed, 4 Aug 2010, Thomas Koetter wrote:
 
 What strikes me though is that according to the spec The br element 
 represents a line break. A *line* break is presentational in nature. 
 The break is structural, but restricting it to a certain presentation of 
 that break lacks the desired separation of structure and presentation.
 
 Wouldn't it make more sense to consider the br element to be just a 
 minor logical break inside a paragraph?
 
 Calling it a line break doesn't say how it is rendered. It's just a 
 conceptual description.

It presupposes the existance of lines, though. Lines are a very visual concept, 
although they can be applied to oral language, as in poems and songs (where 
‘//’ is often an accepted representation for line breaks in transcripts). An 
oral line may span several literal lines and vice versa.

Paragraphs (and breaks therein), of course, are also a concept of written 
language, as are sentences.

However, I believe the underlying problem is simply that “line break” is (too) 
often used and understood as a synonym for “new line”, at least by non-native 
speakers. Speaking of breaks on line or paragraph level therefore makes more 
sense to me.

 (A minor logical break inside a paragraph is not generally represented 
 by a line break, at least not in any typographic conventions I've seen; 
 usually, in my experience, those are denoted either using ellipses, 
 em-dashes, or parentheses.)

That’s true for real paragraphs, but not for most “non-paragraphic” texts, e.g. 
addresses.

Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread James May
On 25 August 2010 12:50, Boris Zbarsky bzbar...@mit.edu wrote:

 On 8/24/10 7:09 PM, Ben Lerner wrote:

  The history navigation analogy is a good one: pages presumably already
 have to handle the pageshow event to deal with being revived from the
 history, and the browser already needs to know how to fire that event.
 Why not reuse those mechanisms? A strawman claim: Nothing may be
 changing from the perspective of the iframe, but it certainly is
 changing from the perspective of the container or the user: detaching an
 iframe from a page is like navigating a browsing context away from a
 page, putting it into hibernation until it's reattached to an active
 document/browsing context. What subtle or important facet of the web am
 I missing that breaks this analogy? (It wouldn't surprise me if I missed
 something obvious, either... :)


 At least in the case of Gecko, there are at least the following things to
 keep in mind:

 1) hibernating documents are very limited in what one can do with
   them (e.g. attempting to mutate the document in any way while
   hibernating will throw it away).
 2) Documents have security policies applied to them based on the
   toplevel content window (or browser tab, if you prefer to think
   about it) they're associated with.  Which means that allowing
   documents not immediately associated with any toplevel window,
   which would be the case right now in Gecko for an iframe not in
   a document, leads to security problems.  This could be changed by
   redoing how the association is implemented, but there's some
   touchy code involved that we'd rather not get wrong.  ;)


  Another reason to consider suspending detached iframes: suppose that in
 the chat window example below, the iframe wasn't just a same-origin
 place to store global state, but also had its own UI, with callbacks and
 event handlers and whatnot. If, during the interim while the iframe was
 being detached, adopted and reattached, that frame executed a timer that
 popped up a modal alert or prompt to the user, how would the user
 reasonably know where that alert came from? And what document(s?) should
 be paused while the alert is shown?


 And for that matter, how would the UA know where the alert came from, in
 terms of correctly parenting it?  This ties back to item #2 above.



Couldn't the iframe be kept alive, but remain associated with it's parent
browsing context until (if) it was re-parented / inserted into a different
document. (does this match what other elements in the DOM behave in terms of
event handlers when they are detached?)

That way, complex hibernate would be uneeded and it would be clear as to
how to handle events, security, etc.


Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread Boris Zbarsky

On 8/26/10 3:23 AM, James May wrote:

Couldn't the iframe be kept alive, but remain associated with it's
parent browsing context until (if) it was re-parented / inserted into a
different document. (does this match what other elements in the DOM
behave in terms of event handlers when they are detached?)


Elements behave fine.  The question is what the Window should do.  What 
should window.parent return in the iframe while detached?  window.top? 
What should window.resizeTo do?  That sort of thing.


-Boris


Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread Boris Zbarsky

On 8/26/10 3:23 AM, James May wrote:

Couldn't the iframe be kept alive, but remain associated with it's
parent browsing context until (if) it was re-parented / inserted into a
different document. (does this match what other elements in the DOM
behave in terms of event handlers when they are detached?)


Elements behave fine.  The question is what the Window should do.  What 
should window.parent return in the iframe while detached?  window.top? 
What should window.resizeTo do?  That sort of thing.


-Boris


Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Henri Sivonen
Silvia Pfeiffer wrote:
 You misunderstand my intent. I am by no means suggesting that no
 WebSRT
 content is treated as SRT by any application. All I am asking for is a
 different file extension and a different mime type and possibly a
 magic
 identifier such that *authoring* applications (and authors) can
 clearly
 designate this to be a different format, in particular if they include
 new
 features. Then a *playback application* has the chance to identify
 them as a
 different format and provide a specific parser for it, instead of
 failing
 like Totem. They can also decide to extend their existing SRT parser
 to
 support both WebSRT and SRT. And I also have no issue with a user
 deciding
 to give a WebSRT file a go by renaming it to .srt.
 
 By keeping WebSRT and SRT as different formats we give the
 applications a
 choice to support either, or both in the same parser. If we don't, we
 force
 them to deal in a single parser with all the oddities of SRT formats
 as well
 as all the extra features and all the extensibility of WebSRT.

Why wouldn't it always be a superior solution for all parties to do the 
following:
 1) Make sure WebSRT never requires processing that'd require rendering a 
substantial body of legacy .srt content in a broken way. (This would require 
supporting non-UTF-8 encodings by sniffing as well as supporting font and 
u, which would happen for free if my innerHTML proposal were adopted.)
 2) Make playback software that supports WebSRT only have a WebSRT code path 
and use that code path for legacy .srt content as well.
?

Specifically, if #1 is done, why would any pragmatic developer not want to do 
#2 if they are supporting WebSRT in their software? Why would anyone want to 
have a code path that turns off new WebSRT features if they have a code path 
that supports WebSRT features?

Or is #1 *impossible* due to the craziness of the legacy? (I thought any given 
.srt consumer only has a single code path and implemetation-wise there aren't 
already multiple .srt format even though doom9 spec-wise there are at least 
two.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


Re: [whatwg] base64 entities

2010-08-26 Thread Kornel Lesiński
On 25.08.2010, at 23:46, Aryeh Gregor wrote:

 These cases can be secured without any new features in browsers (by escaping 
 whitespace using numeric entities):
 
 function htmlescape($str) {
return preg_replace('/[\s\']/e','.ord($0).;',$str);
 }
 
 That doesn't work in script for text/html, does it?

Ah, indeed.

Another tricky case came to my mind, which entities cannot secure (unless 
special magic is defined for the new entity):

onclick=show('base64;')

 These are reasonable points.  How many vulnerabilities would it
 actually prevent in practice if htmlspecialchars() were replaced with
 this everywhere?  XSS is usually when you don't escape things at all,
 not when you escape them in a slightly wrong way.  Easy escaping in
 script and style would be nice, though (or is there already some
 way to do that?).


In PHP json_encode() works great for outputting data in JS (and can be 
configured to JS-escape HTML-unsafe chars too), but I feel like I'm the only 
person who knows about it :)

-- 
regards, Kornel Lesiński






Re: [whatwg] base64 entities

2010-08-26 Thread Martin Janecke

Am 26.08.10 01:41, schrieb Adam Barth:

On Wed, Aug 25, 2010 at 1:55 PM, Ian Hicksoni...@hixie.ch  wrote:

On Wed, 25 Aug 2010, Adam Barth wrote:

HTML should support Base64-encoded entities to make it easier for
authors to include untrusted content in their documents without
risking XSS.


Seems like a fine idea. Get browsers to implement it and I'll spec it.


I've posted a patch for WebKit:

https://bugs.webkit.org/show_bug.cgi?id=44641

Some subtleties:

1) Some base64 decoders tolerate newlines.  We don't want to decode
entities with newlines.
2) Decoding base64 results in binary data.  We'll need to convert that
data to characters in order to deal with it in the DOM.  We use always
use UTF8 for that transformation, regardless of the document's
encoding.
3) Null characters are replaced with U+FFFD.
4) The empty base64 entity%; is consumed and is replaced with the
empty string.
5) Invalid base64 is rejected and the entity is not decoded.

Adam



Is it necessary to consider compatibility issues here? In HTML4 this
seems to have been valid code (- http://validator.w3.org/check):

!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01//EN
  http://www.w3.org/TR/html4/strict.dtd;
html
head
meta http-equiv=Content-type content=text/html; charset=US-ASCII
titlebase64 entity test/title
/head
body
pLook at these fine ASCII characters: %4oCT;/p
/body
/html

Now it would be interpreted differently. Could this lead to old
documents changing in meaning? Do we have to consider old documents that 
were not completely valid (e.g. lacked a doctype declaration)?


Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Philip Jägenstedt

On Thu, 26 Aug 2010 09:58:29 +0200, Henri Sivonen hsivo...@iki.fi wrote:


Silvia Pfeiffer wrote:

You misunderstand my intent. I am by no means suggesting that no
WebSRT
content is treated as SRT by any application. All I am asking for is a
different file extension and a different mime type and possibly a
magic
identifier such that *authoring* applications (and authors) can
clearly
designate this to be a different format, in particular if they include
new
features. Then a *playback application* has the chance to identify
them as a
different format and provide a specific parser for it, instead of
failing
like Totem. They can also decide to extend their existing SRT parser
to
support both WebSRT and SRT. And I also have no issue with a user
deciding
to give a WebSRT file a go by renaming it to .srt.

By keeping WebSRT and SRT as different formats we give the
applications a
choice to support either, or both in the same parser. If we don't, we
force
them to deal in a single parser with all the oddities of SRT formats
as well
as all the extra features and all the extensibility of WebSRT.


Why wouldn't it always be a superior solution for all parties to do the  
following:
 1) Make sure WebSRT never requires processing that'd require rendering  
a substantial body of legacy .srt content in a broken way. (This would  
require supporting non-UTF-8 encodings by sniffing as well as supporting  
font and u, which would happen for free if my innerHTML proposal  
were adopted.)
 2) Make playback software that supports WebSRT only have a WebSRT code  
path and use that code path for legacy .srt content as well.

?

Specifically, if #1 is done, why would any pragmatic developer not want  
to do #2 if they are supporting WebSRT in their software? Why would  
anyone want to have a code path that turns off new WebSRT features if  
they have a code path that supports WebSRT features?


I think many media player developers would be hesitant to include a full  
HTML parser just for parsing (Web)SRT, especially since they'd also need a  
layout engine to get anything more than they would get from a simpler  
parser.


I do think it's a good idea to make the WebSRT handle existing SRT content  
as well as possible. The encoding issue is easy to side-step by just  
saying that that's a preprocessing step.


Or is #1 *impossible* due to the craziness of the legacy? (I thought any  
given .srt consumer only has a single code path and implemetation-wise  
there aren't already multiple .srt format even though doom9 spec-wise  
there are at least two.)


There are some issues with the current WebSRT parser that I've been  
meaning to send mail about, but by my impression is that it's not  
impossible to define a parser which works well enough to replace existing  
ones.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Henri Sivonen
  Why wouldn't it always be a superior solution for all parties to do
  the
  following:
   1) Make sure WebSRT never requires processing that'd require
   rendering
  a substantial body of legacy .srt content in a broken way. (This
  would
  require supporting non-UTF-8 encodings by sniffing as well as
  supporting
  font and u, which would happen for free if my innerHTML
  proposal
  were adopted.)
   2) Make playback software that supports WebSRT only have a WebSRT
   code
  path and use that code path for legacy .srt content as well.
  ?
 
  Specifically, if #1 is done, why would any pragmatic developer not
  want
  to do #2 if they are supporting WebSRT in their software? Why would
  anyone want to have a code path that turns off new WebSRT features
  if
  they have a code path that supports WebSRT features?
 
 I think many media player developers would be hesitant to include a
 full
 HTML parser just for parsing (Web)SRT, especially since they'd also
 need a
 layout engine to get anything more than they would get from a simpler
 parser.

If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested by 
whatever potentially spec-incompliant means), why would they not use the same 
ingest code path for both?

If the app isn't capable of supporting any feature that's permitted in WebSRT 
but not part of legacy SRT, how does failing at the point of finding out that 
this file claims to be WebSRT rather than SRT make things much better than 
failing at I found stuff that I can't handle/skip over in this SRT file?

In particular, it seems like a wrong optimization to make it possible for apps 
that don't support any WebSRT features over legacy features to fail early than 
to make apps that support at least one WebSRT-introduced feature unify their 
processing of WebSRT and SRT by processing both WebSRT and SRT as one format 
where legacy SRT files just don't happen to use new features.

To me, having different code paths for WebSRT and SRT is like IE adding a new 
Trident snapshot with every release whereas supporting SRT by treating it as 
WebSRT with no new features (if the app is supporting even one 
WebSRT-introduced feature!) is like what the other browsers are doing with 
HTML/CSS/DOM.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


Re: [whatwg] base64 entities

2010-08-26 Thread Julian Reschke

On 25.08.2010 22:50, Adam Barth wrote:

== Summary ==
...


Not convinced. There's already one way to escape these things, and this 
is supported in all UAs.


I don't see how adding another mechanism will help those who can't use 
the first one properly. For instance, people unable to escape ,  
and  are likely also unable to get the UTF-8 conversion right.


Best regards, Julian


Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Philip Jägenstedt
On Wed, 25 Aug 2010 17:40:08 +0200, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:


 At this point, what is your recommendation? The following ideas have  
been

on the table:

* Change the file extension to something other than .srt.

I don't have an opinion, browsers ignore the file extension anyway.


 Yes, I think we should definitely have a new file extension.



I'll leave this to others to decide, but since browsers have no  
concept

of
file extensions, just using .srt will work. If the format is SRT-like
it's
likely at least some files will use .srt in practice.




All SRT files in practice use the .srt extension - it is typically how
these
formats are identified by applications. Just because *nix ignores file
extensions mostly for identifying file types doesn't mean that
applications
do. Again, I believe strongly that re-using the same file extension is  
the

one biggest pain we can inflict on the community.



As shown above, several popular (?) media players ignore or give little
weight to the file extension.



I don't think that's a fair sample - as I said, on Linux and on the
command-line things are different. I have a GUI mplayer here and it  
reacts

like VLC - doesn't let me open .wsrt files. The vast majority of
applications on Windows and the Mac make their decision on whether they
support files based on the file extension.


That the file selection dialogs are filtered by file extensions doesn't  
mean that applications don't sniff the content. In fact, MPlayer, VLC and  
Totem will happily load and use an SRT file even if it is called foo.smi,  
even though SAMI is a completely incompatible format. In other words, they  
sniff the content as being SRT. The reason that they rely on sniffing is  
likely that many files use the wrong file extension (my OpenSubtitles  
batch have no extensions, so I have no statistics on this).


Again, if we want to avoid exposing existing SRT parsers to WebSRT syntax,  
then the format needs to be more incompatible. File extensions will be  
changed, popular players rely on sniffing, some ignore leading garbage and  
also headers can simply be removed by naive conversion tools.


Assuming we pick the same file extension and we now have a new  
application

that only supports WebSRT parsing, we will make a large bunch of existing
valid SRT files invalid - not only those that are not in UTF-8, but also
those with font../font and u.../u. I do wonder if the text  
between

the font start and end element and inside the u../u may even get
removed because of lack of support for these.


I've seen no application that removes everything between tags it doesn't  
recognize, the only things that I've seen happen is treating it as plain  
text or ignoring the tags much like a browser does with HTML.



  * Add a header to WebSRT to make it uniquely identifiable.




The header would have to be mandatory and browsers would have to  
reject

files that don't have it. Such files would be compatible with some
existing
software and break some, depending on how they sniff. We could also  
put

metadata in such a header.


 Yes, I think we need to introduce a header. Maybe we can hide all  
the

structure in what SRT recognizes as comments (i.e. start the lines as
;.
But I believe we need some hints like the @profile to identify the  
type

of
the cues and the link to link to a style sheet, and we need  
metadata

like
the meta element of HTML headers.


I had no idea that semicolon was used for comments in SRT, is this  
usage

widespread? Does it work in most players?




I thought it was, but maybe it was just introduced for WebSRT. It is  
not
tested in Hixie's SRT research[2]. Can you take a quick look through  
your
SRT file collection if there are any? I'm probably wrong about this  
seeing

as it's not mentioned in the wiki page for SRT [3].

[2] http://wiki.whatwg.org/wiki/SRT_research
[3] http://en.wikipedia.org/wiki/SubRip



OK, I grepped the 1 files. Only 15 had any lines beginning with a
semicolon, and by manual inspection it doesn't look like any of them are
clearly intended as comments (it's hard to tell, all are in foreign
languages). None of them were at the very beginning of the file.



Ah, that actually makes for another incompatibility of WebSRT and SRT:  
such
lines are regarded as comments in WebSRT when they probably aren't in  
SRT.


I can't find anything about this when searching for comment and  
semicolon in the spec, are you sure you're not thinking of some other  
format than WebSRT?


It seems increasingly that the only thing that WebSRT and SRT still have  
in
common is the -- character sequence. As a friend of mine in a11y  
recently
said: I was hoping to never have to stare at -- ever again... We  
could

indeed go all the way and define an much more different format, though I
don't think it will create implementations as quickly as a SRT-based but
changed format.


I would prefer if we follow one of two paths:

1. Let 

Re: [whatwg] Fwd: Discussing WebSRT and alternatives/improvements

2010-08-26 Thread Philip Jägenstedt

On Thu, 26 Aug 2010 11:52:26 +0200, Henri Sivonen hsivo...@iki.fi wrote:


 Why wouldn't it always be a superior solution for all parties to do
 the
 following:
  1) Make sure WebSRT never requires processing that'd require
  rendering
 a substantial body of legacy .srt content in a broken way. (This
 would
 require supporting non-UTF-8 encodings by sniffing as well as
 supporting
 font and u, which would happen for free if my innerHTML
 proposal
 were adopted.)
  2) Make playback software that supports WebSRT only have a WebSRT
  code
 path and use that code path for legacy .srt content as well.
 ?

 Specifically, if #1 is done, why would any pragmatic developer not
 want
 to do #2 if they are supporting WebSRT in their software? Why would
 anyone want to have a code path that turns off new WebSRT features
 if
 they have a code path that supports WebSRT features?

I think many media player developers would be hesitant to include a
full
HTML parser just for parsing (Web)SRT, especially since they'd also
need a
layout engine to get anything more than they would get from a simpler
parser.


If their app can ingest both WebSRT and legacy SRT (with WebSRT ingested  
by whatever potentially spec-incompliant means), why would they not use  
the same ingest code path for both?


I don't they should or would, I'm just saying that they'd probably be  
hesitant to use an HTML parser in that single code path, as there's very  
little benefit for them.


If the app isn't capable of supporting any feature that's permitted in  
WebSRT but not part of legacy SRT, how does failing at the point of  
finding out that this file claims to be WebSRT rather than SRT make  
things much better than failing at I found stuff that I can't  
handle/skip over in this SRT file?


In particular, it seems like a wrong optimization to make it possible  
for apps that don't support any WebSRT features over legacy features to  
fail early than to make apps that support at least one WebSRT-introduced  
feature unify their processing of WebSRT and SRT by processing both  
WebSRT and SRT as one format where legacy SRT files just don't happen to  
use new features.


To me, having different code paths for WebSRT and SRT is like IE adding  
a new Trident snapshot with every release whereas supporting SRT by  
treating it as WebSRT with no new features (if the app is supporting  
even one WebSRT-introduced feature!) is like what the other browsers are  
doing with HTML/CSS/DOM.


Is this in reply to something other than what you quoted? In any case, I  
agree.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Built-in image sprite support in HTML5

2010-08-26 Thread Tab Atkins Jr.
On Wed, Aug 25, 2010 at 7:00 PM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 It would, however, be good to have an indication where HTML would like to
 see it going. Would it be better for a media fragment URI for images such as
 http://example.com/picture.png#xywh=160,120,320,240  to display the full
 image with the rectangle somehow highlighted (as is the case with fragment
 URIs to HTML pages), or would it be better to actually just display the
 specified region and hide the rest of the image (i.e. create a sprite)? What
 makes the most sense for images?

The CSS Image Values Module ( http://dev.w3.org/csswg/css3-images/#url
) is currently recommending Media Fragments as a way to sprite out a
portion of a resource.  We have a note that we're expecting a spec to
reference at some point.

~TJ


Re: [whatwg] Built-in image sprite support in HTML5

2010-08-26 Thread Silvia Pfeiffer
On Thu, Aug 26, 2010 at 9:01 PM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 On Wed, Aug 25, 2010 at 7:00 PM, Silvia Pfeiffer
 silviapfeiff...@gmail.com wrote:
  It would, however, be good to have an indication where HTML would like to
  see it going. Would it be better for a media fragment URI for images such
 as
  http://example.com/picture.png#xywh=160,120,320,240  to display the full
  image with the rectangle somehow highlighted (as is the case with
 fragment
  URIs to HTML pages), or would it be better to actually just display the
  specified region and hide the rest of the image (i.e. create a sprite)?
 What
  makes the most sense for images?

 The CSS Image Values Module ( http://dev.w3.org/csswg/css3-images/#url
 ) is currently recommending Media Fragments as a way to sprite out a
 portion of a resource.  We have a note that we're expecting a spec to
 reference at some point.

 ~TJ


Oh, wow, that's good to know. Thanks!
Silvia.


Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread James May
On 26 August 2010 17:27, Boris Zbarsky bzbar...@mit.edu wrote:

 On 8/26/10 3:23 AM, James May wrote:

 Couldn't the iframe be kept alive, but remain associated with it's
 parent browsing context until (if) it was re-parented / inserted into a
 different document. (does this match what other elements in the DOM
 behave in terms of event handlers when they are detached?)


 Elements behave fine.  The question is what the Window should do.  What
 should window.parent return in the iframe while detached?  window.top? What
 should window.resizeTo do?  That sort of thing.

 -Boris



I thought I just suggested that?

Everything works normally (as if it was still attached) until it is
reattached, when the situation is re-evaluated.

In terms of resource consumption, I don't see how this would be any
different to any other kind of leak that web content can trigger. (I think
someone mentioned that iframes can be GC'd normally)


Re: [whatwg] INCLUDE and links with @rel=embed

2010-08-26 Thread Ian Hickson
On Thu, 5 Aug 2010, Bjartur Thorlacius wrote:
  On Tue, 18 May 2010, bjartur wrote:
  
   First of all I think we should use a rel=embed href=uri-ref 
   instead of source.
  
  What problem would this solve?
 
 It would tell UAs that don't implement HTML 5 that the value of @href is 
 an URI. Then it can provide means for the user to retrieve the 
 identified resource (and do something useful with it).

Surely the kind of URL is already fully given by the scheme, making this 
rather moot.


 For authors it would unnecessiate constructs such as (excerpt from spec):
 video controls src=http://video.example.com/vids/315981;
a href=http://video.example.com/vids/315981;View video/a.
 /video
 
 In fact, having the ability to follow this link is useful even though
 my browser supports video. But that's an UI issue.

I don't understand how it would affect this.


  On Wed, 19 May 2010, Bjartur Thorlacius wrote:
  
 Is the existing syntax backwards compatible? When using A, you
   get a nice link as fallback content automagically, not requiring any
   special workarounds by the content author. AFAICT you don't even get
   that when using a browser that doesn't support audio and video.
  
  Indeed, with those you have to provide the fallback content (which could
  e.g. be flash) as a descendant of the audio/video element.

 As a user of a browser that doesn't fully support video I'd prefer
 getting a hyperlink to the resource to a Flash program. Just sayin'.

Most authors would rather the user never knew there was a difference and 
just get the video, it seems.

 
  If you're saying that we should also support other timed-based formats 
  in the future even if they are not video, e.g. if you are saying we 
  should support formats like SMIL, then there's no reason you can't do 
  that with video itself. video really is just an API to time-based 
  visual data, it doesn't have to be a sequence of bitmaps.

 Oh, the following quote confused me.

  The video element is a media element whose media data is ostensibly 
  video data

I picked the word ostensibly on purpose for that sentence. :-)


 I'm not just talking about SMIL. I'm talking about using a secondary 
 feature of media elements (the ability to link to multiple alternative 
 resources) even if the main feature (the API) is irrelevant.
 
 video
   source src=f.utf8 charset=utf8
   source src=f.latin1 charset=latin1
 /video
 video
   source src=img.png type=image/png
   source src=img.svg type=image/svg+xml
 /video
 
 I don't need to know the duration of an unanimated PNG.

Ah, yeah, that isn't supported. You can just use object for the image 
case:

   object data=img.png type=image/png
  object data=img.svg type=image/svg+xml
  /object
   /object

In the character encoding case, everyone supports UTF-8, so just use that.


  On Wed, 19 May 2010, bjartur wrote:
  
   Yeah, maybe my crazy idealism and tendency to reuse existing things 
   don't mix up in this case. The main purpose of video and audio 
   is to create a scripting interface to online video. But they also 
   add new linking capabilities which should be available to any 
   content whatsoever.
  
  I don't really see how. In what sense do they add new linking 
  capabilities?
 
 In the sense of multiple alternative (media) resources.
 
 This could possibly be done with object but its fallback mechanism 
 seems inferior.

The video one is very specific to codecs and so forth; I don't think it 
would make sense to generalise it. object already handled it fine.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] base64 entities

2010-08-26 Thread And Clover

On 2010-08-26 13:58:24, Julian Reschke wrote:


Not convinced. There's already one way to escape these things, and this
is supported in all UAs.


Totally agree. If a web author isn't sufficiently experienced to 
remember to call an HTML-encoding function, there is no reason to 
believe they'll think to call a base64-encoding function either. The 
proposal adds more parsing complexity (and XML incompatibility) for no 
obvious gain.


However, I'm all for making standardised HTML-encoder/decoder and 
base64-encoder/decoder functions available at a `window` or ECMAScript 
language level.


`atob`/`btoa` do the job, but they're byte encoders not characters; they 
expect 'binary' strings where each charCode is the ordinal value of a 
byte. That is to say, they `btoa` encodes the input string using 
genuine-ISO-8859-1 and not UTF-8. This is necessary for a 
general-purpose base64 implementation (otherwise many base64 strings 
would not be decodable at all), but may be unexpected.


Is it worth providing a UTF-8-based variant or argument? Otherwise users 
would have to convert from UTF-8-misdecoded-as-ISO-8859-1 strings 
manually. (That's not difficult, using `decodeURIComponent(escape(s))`, 
but this trick isn't obvious or well-known.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


Re: [whatwg] IDL attribute reflecting enumerated attributes not limited to only know values

2010-08-26 Thread Ian Hickson
On Fri, 6 Aug 2010, Aryeh Gregor wrote:
 On Fri, Aug 6, 2010 at 3:01 PM, Ian Hickson i...@hixie.ch wrote:
  I'm happy to make more of them limited, especially new attributes or ones
  that were already that way, but I'd rather not change the default as that
  can have unexpected effects (e.g. some of the attributes are definitely
  not so limited, and I don't recall which that might be).
 
 The enumerated attributes in the spec right now that are not limited to 
 only known values are, by my count:
 
 * audio.preload, video.preload (note that at least WebKit appears to
 treat these as limited to known values already)
 * command.type
 * form.autocomplete, input.autocomplete
 * track.kind

These are all changed now.

 * marquee.direction

What do browsers do for this one?

 * marquee.trueSpeed

This is now a boolean attribute.

 * meta.httpEquiv

I'm pretty sure browsers don't treat this as limited to only known values.

 * th.scope
 * textarea.wrap

Browsers don't seem to limit these.


On Sat, 7 Aug 2010, Mounir Lamouri wrote:
 On 08/06/2010 09:01 PM, Ian Hickson wrote:
  - input.autocomplete: at the moment, it is returning the content but 
  it could return the resulting autocompletion state which is maybe a 
  bit more than just being limited to only known values but still in 
  the same spirit.
  
  I haven't changed this; what's the use case for knowing the actual 
  state?
 
 Theoretically speaking, I think input.autocomplete should return the
 current autocompletion state because that would follow the actual idea
 of enumerated attributes limited to only known values.

There's a big difference between reflecting the state of the attribute 
(what reflecting enumerated attributes does) and reflecting the state of 
the actual feature (which is rare in the DOM).


 Indeed, these kind of enumerated attributes doesn't return the content 
 value but the value associated with the current state and in that case 
 the 'state' is the autocompletion state.

No, the attribute's state is based on its value and is distinct from the 
actual autocompletion state.


 Practical speaking, autocomplete is mostly used in writing (authors want 
 to force/disable autocomplete) and we can assume that a script reading 
 this value is going to check if the element have autocompletion. Having 
 input.autocomplete returning this state may prevent the authors to 
 repeat the algorithm thus preventing errors and making further changes 
 in specification easier (and transparent).

I don't follow.


 By the way, why autocomplete IDL attributes have been introduced in the 
 specifications?

Completeness.


On Tue, 17 Aug 2010, Aryeh Gregor wrote:

 Test case:
 
 !doctype html
 script
var el = document.createElement(form);
el.setAttribute(method, get);
alert(el.method);
el.setAttribute(method, GET);
alert(el.method);
 /script
 
 Spec:
 
 
 If a reflecting IDL attribute is a DOMString whose content attribute
 is an enumerated attribute, and the IDL attribute is limited to only
 known values, then, on getting, the IDL attribute must return the
 conforming value associated with the state the attribute is in (in its
 canonical case) . . .
 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#reflecting-content-attributes-in-idl-attributes
 
 This says it should echo GET twice.  Four out of the five browsers I
 tested in (Firefox 4 beta, Chrome dev, Safari 5, Opera 10.60) echo
 get and then GET.  IE8 and IE9PP4 echo get twice.  I think the
 spec and IE are right here -- you should be able to test form.method
 == GET (or == get, whichever) and have it work whenever it's in
 the GET state.  However, since 4/5 of browsers disagree, I'm asking if
 anyone thinks the spec should be changed, before I file browser bugs.

The real question is, would implementing the spec lead to compatibility 
issues?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread Boris Zbarsky

On 8/26/10 11:58 AM, James May wrote:

I thought I just suggested that?

Everything works normally (as if it was still attached) until it is
reattached, when the situation is re-evaluated.


That could fall afoul of security checks that assume that an iframe with 
a non-null parent is in fact a subframe and that it's owner element is 
in the DOM.  I know Gecko certainly has such internally.


Again, nothing insurmountable, but there's a bunch of code in Gecko that 
makes assumptions about when windows can and can't exist that would need 
auditing. I can't speak to the web compat aspects.



In terms of resource consumption, I don't see how this would be any
different to any other kind of leak that web content can trigger.


I don't think that's an issue, though this does raise the question of 
when it's OK to gc the iframe.



(I think someone mentioned that iframes can be GC'd normally)


Can they, with your proposal?  It seems that with your proposal if you 
remove an iframe from the DOM and then forget about it then as long as 
there's any network activity in that iframe or anything else which might 
potentially trigger script it cannot be gced.  This seems like it would 
make it very easy to leak document after document...


-Boris



[whatwg] Clarification on @srcdoc referrer and base URL

2010-08-26 Thread Justin Schuh
What should the baseURL and referrer be for a @srcdoc nested browsing
context? If I follow the base URL behavior for about:blank it will
just be inherited from the creator document. That seems like the right
thing to do, so I think section 2.5.1 should be modified to read:

If fallback base url is about:blank or about:srcdoc, and the
Document's browsing context has a creator browsing context, then let
fallback base url be the document base URL of the creator Document
instead.

The referrer seems trickier. I couldn't find the about:blank referrer
behavior specified anywhere, and in my testing it does not inherit the
creator document's referrer. However, it seems to me that maybe
about:srcdoc should, even if about:blank does not.

Regards,
Justin


Re: [whatwg] base64 entities

2010-08-26 Thread Aryeh Gregor
On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschke julian.resc...@gmx.de wrote:
 Not convinced. There's already one way to escape these things, and this is
 supported in all UAs.

Adam gave two examples of cases where htmlspecialchars() is
insufficient, even if authors do use it.  This proposal is completely
general and will work anywhere, even in script.  Is automated
general escaping even possible right now in script for text/html?


[whatwg] Proposal for a modal element

2010-08-26 Thread E.J. Zufelt
Good afternoon,

I am wondering if public discussion has been had over the concept of 
introducing a dialog element into html5.

Normally a modal dialog is created using scripting and CSS to restrict focus 
and activity within the modal segment of the DOM and to style the modal 
section of the DOM to appear as though it is a separate region floating above 
the remainder of the document.

A modal element type could indicate to UAs that a segment of the DOM is to be 
treated as active, while the remainder of the DOM is to be inactive.  Focus 
could be automatically set to the first natively focusable element within the 
modal segment of the DOM, or could be explicitly set through scripting.  UAs 
could provide a default style for modals, as they do for other elements, but 
the developer would normally need to adjust the style using CSS for proper 
sizing and positioning.

Thanks,
Everett Zufelt
http://zufelt.ca

Follow me on Twitter
http://twitter.com/ezufelt

View my LinkedIn Profile
http://www.linkedin.com/in/ezufelt





Re: [whatwg] base64 entities

2010-08-26 Thread Julian Reschke

On 26.08.2010 22:10, Aryeh Gregor wrote:

On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de  wrote:

Not convinced. There's already one way to escape these things, and this is
supported in all UAs.


Adam gave two examples of cases where htmlspecialchars() is
insufficient, even if authors do use it.  This proposal is completely
general and will work anywhere, even inscript.  Is automated
general escaping even possible right now inscript  for text/html?


I have to admit that I'm not sure what's special about script here. 
Are you saying that it's insufficient to escape all characters that have 
a special meaning there?


Server-wise, how is introducing a new escape mechanism any better than 
fixing the support code for the existing mechanism?


Best regards, Julian



Re: [whatwg] base64 entities

2010-08-26 Thread Boris Zbarsky

On 8/26/10 4:10 PM, Aryeh Gregor wrote:

On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de  wrote:

Not convinced. There's already one way to escape these things, and this is
supported in all UAs.


Adam gave two examples of cases where htmlspecialchars() is
insufficient, even if authors do use it.  This proposal is completely
general and will work anywhere, even inscript.


Sorta.  It'll let you put the data in script, but it won't verify that 
the data doesn't change the meaning of the script, obviously, or inject 
script of its own to run.



Is automated general escaping even possible right now inscript  for text/html?


Defined how?

-Boris



Re: [whatwg] base64 entities

2010-08-26 Thread Julian Reschke

On 26.08.2010 22:10, Aryeh Gregor wrote:

On Thu, Aug 26, 2010 at 5:58 AM, Julian Reschkejulian.resc...@gmx.de  wrote:

Not convinced. There's already one way to escape these things, and this is
supported in all UAs.


Adam gave two examples of cases where htmlspecialchars() is
insufficient, even if authors do use it.  This proposal is completely
general and will work anywhere, even inscript.  Is automated
general escaping even possible right now inscript  for text/html?


OK, sorry for my multiple posts.

I now get the point about the additional problems in script, but I fail 
to see how the proposal addresses this, unless expanding these entities 
is suppose to happen *after* parsing the script.


Best regards, Julian


Re: [whatwg] base64 entities

2010-08-26 Thread Aryeh Gregor
On Thu, Aug 26, 2010 at 4:20 PM, Julian Reschke julian.resc...@gmx.de wrote:
 I have to admit that I'm not sure what's special about script here. Are
 you saying that it's insufficient to escape all characters that have a
 special meaning there?

data:text/html,!doctype html
scriptalert(amp;);/script

alerts amp;, not .  So generally, you just don't escape stuff in
script, but I don't know of any general-purpose way to have
/string in a string literal (or anywhere else), other than
splitting it up like /scr + ipt.

On Thu, Aug 26, 2010 at 4:25 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Sorta.  It'll let you put the data in script, but it won't verify that the
 data doesn't change the meaning of the script, obviously, or inject script
 of its own to run.

Hmm.  Okay, then I don't get how this helps in Adam's second example:

script
elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.';
/script

If it doesn't help there, then I don't see any use-cases, since the
first example is trivially solvable by just using quotes.

 Is automated general escaping even possible right now inscript  for
 text/html?

 Defined how?

Suppose I have some arbitrary blob of trusted JavaScript, and I want
to output it as an inline script in text/html.  How do I escape it so
that it executes as intended -- in particular, given that it might
contain the string /script in string literals, comments, and so
on?  In most contexts, you could just replace '' = 'lt;', but that
doesn't work in inline script.

(Right?  I admit I'm mostly cargo-culting this, and have no idea how
text/html parsing works at all.  I have fond dreams of an HTML
serialization that's actually comprehensible to authors but has
reasonable error handling . . .)


Re: [whatwg] Proposal for a modal element

2010-08-26 Thread Dirk Pranke
Hi E.J.,

I've actually been working with some other people on the Chromium team
for what we were calling a topmost window that could be used for
modal dialogs. After some feedback, it's been suggested that we try to
turn this into a more generic dialog element.

I haven't yet incorporated that feature into the writeup, but I'll
send you a link off-list. I hope to update the doc and post it to the
list for feedback very soon.

-- Dirk

On Thu, Aug 26, 2010 at 1:12 PM, E.J. Zufelt li...@zufelt.ca wrote:
 Good afternoon,
 I am wondering if public discussion has been had over the concept of
 introducing a dialog element into html5.
 Normally a modal dialog is created using scripting and CSS to restrict focus
 and activity within the modal segment of the DOM and to style the modal
 section of the DOM to appear as though it is a separate region floating
 above the remainder of the document.
 A modal element type could indicate to UAs that a segment of the DOM is to
 be treated as active, while the remainder of the DOM is to be inactive.
  Focus could be automatically set to the first natively focusable element
 within the modal segment of the DOM, or could be explicitly set through
 scripting.  UAs could provide a default style for modals, as they do for
 other elements, but the developer would normally need to adjust the style
 using CSS for proper sizing and positioning.
 Thanks,
 Everett Zufelt
 http://zufelt.ca
 Follow me on Twitter
 http://twitter.com/ezufelt

 View my LinkedIn Profile
 http://www.linkedin.com/in/ezufelt





Re: [whatwg] IDL attribute reflecting enumerated attributes not limited to only know values

2010-08-26 Thread Aryeh Gregor
On Thu, Aug 26, 2010 at 2:00 PM, Ian Hickson i...@hixie.ch wrote:
 * marquee.direction

 What do browsers do for this one?

Seems like they don't limit it to known values, at least Firefox/Opera/Chrome.

 * meta.httpEquiv

 I'm pretty sure browsers don't treat this as limited to only known values.

 * th.scope
 * textarea.wrap

 Browsers don't seem to limit these.

If we could change all these to limited without compat problems,
though, it would be a nice little simplification -- enumerated
attributes would all have the same reflection behavior.

 The real question is, would implementing the spec lead to compatibility
 issues?

As Mounir Lamouri pointed out, Firefox nightlies already mostly
implement the spec here, so I guess we'll find out.  :)  The spec is
considerably nicer than preexisting behavior.


Re: [whatwg] base64 entities

2010-08-26 Thread Kornel Lesiński

On Thu, 26 Aug 2010 21:56:12 +0100, Aryeh Gregor
simetrical+...@gmail.com wrote:


Suppose I have some arbitrary blob of trusted JavaScript, and I want
to output it as an inline script in text/html.  How do I escape it so
that it executes as intended -- in particular, given that it might
contain the string /script in string literals, comments, and so
on?  In most contexts, you could just replace '' = 'lt;', but that
doesn't work in inline script.


Inside strings you replace / with \/ (\/ is valid escape sequence  
for /), outside strings you'd need to add space between / (a corner  
case x /regexliteral/).


You might also use script src=data:.

--
regards, Kornel


Re: [whatwg] base64 entities

2010-08-26 Thread Kornel Lesiński
On Wed, 25 Aug 2010 22:52:42 +0100, Kornel Lesiński kor...@geekhood.net  
wrote:



script
elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.';
/script


These cases can be secured without any new features in browsers (by  
escaping whitespace using numeric entities):


I realized I was wrong about this one. It won't prevent script injection  
in JS strings (in places where entities are decoded, including script in  
XML), because entity will be changed to plain text before JavaScript is  
tokenized.


For this reason, base64 entities won't solve this problem either, unless  
they're specifically defined as JavaScript construct, not only HTML  
construct (and I think such mix of parser would be bad).


If parser decoded such entities in script (like XHTML does):

foo = '%JztldmlsKCk7Jw==;'

then decoded string passed to JS parser would look like:

innerHTML = '';evil();''

which defeats purpose of the encoding.

OTOH if HTML parser didn't decode these entities in script (which is  
current text/html behavior), then JS would get undecoded string (i.e.  
foo.charAt(0) == '').


--
regards, Kornel


Re: [whatwg] base64 entities

2010-08-26 Thread Anne van Kesteren
On Thu, 26 Aug 2010 22:30:00 +0200, Julian Reschke julian.resc...@gmx.de  
wrote:
I now get the point about the additional problems in script, but I fail  
to see how the proposal addresses this, unless expanding these entities  
is suppose to happen *after* parsing the script.


If you have

  ele.innerHTML = '%;'

inside script it would be expanded the moment innerHTML is invoked  
(inside script entities are not expanded) and thus be safe from  
/script injection and such. So yes, it happens after.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] base64 entities

2010-08-26 Thread And Clover

On 08/26/2010 10:56 PM, Aryeh Gregor wrote:


I don't know of any general-purpose way to have
/string in a string literal (or anywhere else),


The simple approach is to use JavaScript string literal escapes: 
`\x3C/script`.


A JSON encoder may offer the option to avoid HTML-special characters in 
string literals, encoded as escapes like `\u003C`. This allows literals 
to be included in a JavaScript block that may or may not be in a CDATA 
element, so may or may not need HTML-encoding.



other than splitting it up like /scr + ipt.


This is a common but wrong idiom that should be avoided; it won't 
validate because in HTML4 the `/` sequence itself (ETAGO) ends a script 
block.



elmt.innerHTML = 'Hi there?php echo htmlspecialchars($name) ?.';


Is a common error (security hole).

Encoding text for use in a JavaScript string literal (`\`-escaping) is 
an entirely different proposition to encoding text for use in HTML 
(entity/character references).


PHP offers no JS-string-literal-escape function. `addslashes` is very 
close, but won't handle some cases with non-ASCII characters correctly. 
Better to use `json_encode` to transfer the string, then write as text:


elmt.textContent = ?php echo json_encode('Hi there, '+$name, 
JSON_HEX_TAG); ?


(assuming innerText or Text Node backup for IE/older browsers.)

A 'magic' escaping feature that will somehow guess what sort of encoding 
the author means is wishful (impossible) thinking. A base64-encoded 
entity reference could do nothing for JavaScript, CSS or other nested 
string context.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


Re: [whatwg] base64 entities

2010-08-26 Thread Adam Barth
2010/8/26 Kornel Lesiński kor...@geekhood.net:
 On Wed, 25 Aug 2010 22:52:42 +0100, Kornel Lesiński kor...@geekhood.net
 wrote:
 script
 elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.';
 /script

 These cases can be secured without any new features in browsers (by
 escaping whitespace using numeric entities):

 I realized I was wrong about this one. It won't prevent script injection in
 JS strings (in places where entities are decoded, including script in
 XML), because entity will be changed to plain text before JavaScript is
 tokenized.

Indeed.  This is not a feature for XML.  XML won't decode the entity
at all.  In HTML, script doesn't decode entities, so the pattern is
safe.

Adam


Re: [whatwg] base64 entities

2010-08-26 Thread Kornel Lesiński
On 26.08.2010, at 23:28, Adam Barth wrote:
 
 script
 elmt.innerHTML = 'Hi there ?php echo htmlspecialchars($name) ?.';
 /script
 
 These cases can be secured without any new features in browsers (by
 escaping whitespace using numeric entities):
 
 I realized I was wrong about this one. It won't prevent script injection in
 JS strings (in places where entities are decoded, including script in
 XML), because entity will be changed to plain text before JavaScript is
 tokenized.
 
 Indeed.  This is not a feature for XML.  XML won't decode the entity
 at all.  In HTML, script doesn't decode entities, so the pattern is
 safe.

Yes, but in that case JS would have to decode the entity on its own. It 
wouldn't be strictly HTML feature, but also change interpretation of JS string 
literals. And what if you use this entity outside JS string? In regex literal?

What about onclick=show('%base64;')? Should this be left insecure, or should 
HTML parser have special entity handling for on* attributes? And then what's 
the meaning of onclick=show('amp;%base64;')?

-- 
regards, Kornel





Re: [whatwg] base64 entities

2010-08-26 Thread Adam Barth
On Wed, Aug 25, 2010 at 6:37 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 8/25/10 7:41 PM, Adam Barth wrote:
 2) Decoding base64 results in binary data.  We'll need to convert that
 data to characters in order to deal with it in the DOM.  We use always
 use UTF8 for that transformation, regardless of the document's
 encoding.

 Note that this issue means that using atob or btoa for dealing with this is
 a huge pain if non-ASCII chars are involved, since those take and return
 byte arrays masquerading as JS strings, not actual Unicode strings.

I'm slightly confused how that works.  How do you represent arbitrary
binary data as characters?  Another option is to provide a base64
encoder/decoder that uses UTF8 to encode/decode the binary.

On Thu, Aug 26, 2010 at 1:38 AM, Martin Janecke whatwg@kaor.in wrote:
 Is it necessary to consider compatibility issues here? In HTML4 this
 seems to have been valid code (- http://validator.w3.org/check):

It's always necessary to consider compatibility.  Perhaps one of our
friends with the ability to grep the web would be kind enough to tell
us how common % followed by base64 characters followed by ; is.


On Thu, Aug 26, 2010 at 2:58 AM, Julian Reschke julian.resc...@gmx.de wrote:
 Not convinced. There's already one way to escape these things, and this is
 supported in all UAs.

Which way is that?

 I don't see how adding another mechanism will help those who can't use the
 first one properly. For instance, people unable to escape ,  and 
 are likely also unable to get the UTF-8 conversion right.

Escaping just those character is insufficient.  The appeal of this
approach is that authors don't need the right blacklist of dangerous
characters.  By the way, there are already folks doing something
similar manually now.  They send the untrusted bytes as base64 and
decode them using JavaScript.

On Thu, Aug 26, 2010 at 1:25 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Sorta.  It'll let you put the data in script, but it won't verify that the
 data doesn't change the meaning of the script, obviously, or inject script
 of its own to run.

Because script does not decode entities in HTML, the attacker will
be limited to what he or she can do with alphanumeric characters, +,
/, and trailing =.  Of course, if the entity appears in a string
context (as is pretty common), the attacker won't be able to break out
of the string context, even by include /script in the attack string
(which is a common vulnerability in hand-rolled escaping schemes).

On Thu, Aug 26, 2010 at 1:30 PM, Julian Reschke julian.resc...@gmx.de wrote:
 I now get the point about the additional problems in script, but I fail to
 see how the proposal addresses this, unless expanding these entities is
 suppose to happen *after* parsing the script.

Yes.  That's precisely what happens.

Kind regards,
Adam


Re: [whatwg] base64 entities

2010-08-26 Thread Boris Zbarsky

On 8/26/10 6:45 PM, Adam Barth wrote:

Note that this issue means that using atob or btoa for dealing with this is
a huge pain if non-ASCII chars are involved, since those take and return
byte arrays masquerading as JS strings, not actual Unicode strings.


I'm slightly confused how that works.  How do you represent arbitrary
binary data as characters?


You mean how do atob/btoa take their binary data in JS-land?  You take 
your byte array, and convert it to a sequence of two-byte units by 
setting the high byte to 0.  This sequence of two-byte units is a JS string.



Another option is to provide a base64
encoder/decoder that uses UTF8 to encode/decode the binary.


Not sure what the exact proposal here is.


Becausescript  does not decode entities in HTML, the attacker will
be limited to what he or she can do with alphanumeric characters


OK.  I had misunderstood what you were proposing for script here.  The 
point is that inside script this base64 thing will only be useful for 
setting innerHTML, right?


-Boris


Re: [whatwg] Input URL State and Files object

2010-08-26 Thread Charles Pritchard

 On 8/25/2010 2:02 PM, Ian Hickson wrote:

On Mon, 2 Aug 2010, Charles Pritchard wrote:

[ UAs can useinput type=file  to let the user enter remote URLs ]

When a user through selection, click+drag or manual entry of a URL
should the browser still submit an Origin request header? It seems that
CORS doesn't come into effect here -- but at the same time, it'd be
handy for logging purposes and added security.

I don't think there'd be an origin, but that's rather up to the user
agent. (In this case it's acting on behalf of the user, not the page, so I
don't think it makes sense to give the page's origin.)

Sounds like an implementer would not include a Referer header, either.

...

Continuing on with tweaking URLs to work with with the File API:

Chrome has gone ahead with their setData proposal, enhancing the 
event.dataTransfer
object so that users may drag a file from within the browser onto their 
desktop.


The extension uses setData with a key of DownloadURL and a value 
including a mime type,

file descriptor and URI.

I'd like this interface to work within ondrop; if getData(DownloadURL) 
is set,
then a FileList would be returned in event.dataTransfer.files, much like 
it is when

users drag files from their desktop into the browser.

This would of course require Origin checks; whereas dragging onto the 
desktop

does not require an Origin check.

...

Here's the current example of setData(DownloadURL) and my comments.

https://code.google.com/p/html5rocks/issues/detail?id=136
var dragElem = document.getElementById(ID_Element_to_be_dragged);
dragElem.addEventListener(
  dragstart,
  function(event) {
event.dataTransfer.setData(
  DownloadURL,
  
application/pdf:sample.pdf:http://example.com/example-download-data;);

  },
  false
);





Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread James May
On 27 August 2010 05:02, Boris Zbarsky bzbar...@mit.edu wrote:

 On 8/26/10 11:58 AM, James May wrote:

 I thought I just suggested that?

 Everything works normally (as if it was still attached) until it is
 reattached, when the situation is re-evaluated.


 That could fall afoul of security checks that assume that an iframe with a
 non-null parent is in fact a subframe and that it's owner element is in the
 DOM.  I know Gecko certainly has such internally.

 Again, nothing insurmountable, but there's a bunch of code in Gecko that
 makes assumptions about when windows can and can't exist that would need
 auditing. I can't speak to the web compat aspects.


Could the iframe be hoisted to the top level of its parent browsing context?


  In terms of resource consumption, I don't see how this would be any
 different to any other kind of leak that web content can trigger.


 I don't think that's an issue, though this does raise the question of when
 it's OK to gc the iframe.


When no references remain in either the DOM or script?

if an 
iframehttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-iframe-elementis
removedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#remove-an-element-from-a-documentfrom
a
Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#documentand
is then subsequently garbage collected, this will likely mean (in the
absence of other references) that the child browsing
contexthttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#child-browsing-context's
WindowProxyhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#windowproxyobject
will become eligble for garbage collection, which will then lead to
that browsing 
contexthttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#browsing-contextbeing
discardedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#a-browsing-context-is-discarded,
which will then lead to its
Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#documentbeing
discardedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#discard-a-documentalso.
This happens without notice to any scripts running in that
Documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#document;
for example, no unload events are fired (the unload a
documenthttp://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#unload-a-document
steps are not run).

Although I'm not sure why this is different from the regular steps.  (
http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#garbage-collection-and-browsing-contexts
)

 (I think someone mentioned that iframes can be GC'd normally)


 Can they, with your proposal?  It seems that with your proposal if you
 remove an iframe from the DOM and then forget about it then as long as
 there's any network activity in that iframe or anything else which might
 potentially trigger script it cannot be gced.  This seems like it would make
 it very easy to leak document after document...


So running scripts and network activity are GC roots?

-- James


Re: [whatwg] Input URL State and Files object

2010-08-26 Thread Jonas Sicking
On Thu, Aug 26, 2010 at 5:24 PM, Charles Pritchard ch...@jumis.com wrote:
  On 8/25/2010 2:02 PM, Ian Hickson wrote:

 On Mon, 2 Aug 2010, Charles Pritchard wrote:

 [ UAs can useinput type=file  to let the user enter remote URLs ]

 When a user through selection, click+drag or manual entry of a URL
 should the browser still submit an Origin request header? It seems that
 CORS doesn't come into effect here -- but at the same time, it'd be
 handy for logging purposes and added security.

 I don't think there'd be an origin, but that's rather up to the user
 agent. (In this case it's acting on behalf of the user, not the page, so I
 don't think it makes sense to give the page's origin.)

 Sounds like an implementer would not include a Referer header, either.

 ...

 Continuing on with tweaking URLs to work with with the File API:

 Chrome has gone ahead with their setData proposal, enhancing the
 event.dataTransfer
 object so that users may drag a file from within the browser onto their
 desktop.

 The extension uses setData with a key of DownloadURL and a value including a
 mime type,
 file descriptor and URI.

 I'd like this interface to work within ondrop; if getData(DownloadURL) is
 set,
 then a FileList would be returned in event.dataTransfer.files, much like it
 is when
 users drag files from their desktop into the browser.

 This would of course require Origin checks; whereas dragging onto the
 desktop
 does not require an Origin check.

I would think that a same-origin check should always be performed. In
firefox, the save-as dialog always displays the website you are
downloading from. However with drag'n'drop no dialog will be shown and
the user will presumably think he/she is downloading from the site
where the drag started.

Or are browsers planning on displaying the save-as dialog?

/ Jonas


Re: [whatwg] Should events be paused on detached iframes?

2010-08-26 Thread Boris Zbarsky

On 8/26/10 10:33 PM, James May wrote:

Could the iframe be hoisted to the top level of its parent browsing context?


Not sure what you mean.


When no references remain in either the DOM or script?

if an |iframe

http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-iframe-element|
is removed

http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#remove-an-element-from-a-document
from a |Document

http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#document|
and is then subsequently garbage collected


It can't become garbage collected while the window inside it isn't, 
since the window inside it references the iframe (via frameElement).



this will likely mean
(in the absence of other references) that the child browsing context

http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#child-browsing-context's
|WindowProxy

http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#windowproxy|
object will become eligble for garbage collection


I don't think it's reasonable to gc the iframe element while leaving 
the window inside alive due to it being referenced.  That introduces 
races where frameElement could suddenly become null at some point 
(possibly between two lines of the same script, or even partway through 
some operation; for example GC can happen, even multiple times, during a 
property get or set).  That would be pretty broken behavior.



Although I'm not sure why this is different from the regular steps.


Presumably the only different thing is the lack of an unload event.


Can they, with your proposal?  It seems that with your proposal if
you remove an iframe from the DOM and then forget about it then as
long as there's any network activity in that iframe or anything else
which might potentially trigger script it cannot be gced.  This
seems like it would make it very easy to leak document after document...

So running scripts and network activity are GC roots?


Not running scripts.  Anything that might potentially run a script in 
the future.


You can think of it as gc roots, sure, and you can also claim that gc'ed 
systems never leak memory.  But is either necessarily useful?  The 
upshot is that random things that the web developer knows nothing about 
and doesn't care about can prevent the memory from being deallocated 
effectively forever from the web developer's point of view.  And worse 
yet, there's no obvious recourse (as in, no way to make sure the thing 
is garbage collected).  Any reasonable person would call that a memory 
leak in the browser, not in the site.  Just like a JS impl that never 
GCes until you navigate away from the page should be considered to have 
a memory leak.


-Boris


Re: [whatwg] Input URL State and Files object

2010-08-26 Thread Charles Pritchard

 On 8/26/2010 7:53 PM, Jonas Sicking wrote:

On Thu, Aug 26, 2010 at 5:24 PM, Charles Pritchardch...@jumis.com  wrote:

Chrome has gone ahead with their setData proposal, enhancing the
event.dataTransfer
object so that users may drag a file from within the browser onto their
desktop.

I would think that a same-origin check should always be performed. In
firefox, the save-as dialog always displays the website you are
downloading from. However with drag'n'drop no dialog will be shown and
the user will presumably think he/she is downloading from the site
where the drag started.

Or are browsers planning on displaying the save-as dialog?


I think that save-as dialogs are implementation-specific.

For example, OS X-based prompts happen when you first open a file,
not when downloading.

The HTML 5 UI/UA permissions are built upon the idea that drag/drop
confers a similar permissibility to right click + context menu actions.