Re: [whatwg] Uploading directories of files

2009-12-13 Thread L. David Baron
On Friday 2009-12-11 02:17 -0800, Jeremy Orlow wrote:
 But regardless.I don't think you could argue that having _some_ path
 information is worse than _none_, right?

Many of those who commented in
https://bugzilla.mozilla.org/show_bug.cgi?id=143220 and its
duplicates would disagree.  Users may not expect the act of
uploading a file to give the Web site details of their file system
structure.  There also seems to be some concern that those details
may provide information useful to an attacker.

-David

-- 
L. David Baron http://dbaron.org/
Mozilla Corporation   http://www.mozilla.com/


Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Aryeh Gregor
On Sat, Dec 12, 2009 at 11:40 PM, Hugh Guiney hugh.gui...@gmail.com wrote:
 With the exception that Flash does not need separate components to be
 active to sustain that functionality. You can toggle quality in Flash
 without any server- or client-side scripts at all. You may need
 ActionScript in some cases, but that's an integral part of Flash,
 whereas JavaScript, PHP, etc. are not integral parts of HTML.

JavaScript is an integral part of HTML to all intents and purposes.
HTML itself does not and should not try to cover use-cases that are
already adequately covered by HTML+JavaScript -- there will always be
things that are better handled by a general-purpose scripting
language.  Of course, moving something into HTML might be valuable
because it makes the feature easier for authors to use, but that needs
to be weighed against the cost of browsers having to implement it
rather than some other feature.

 But that is exactly how content negotiation in HTTP already works.

Well, yes.  On the other hand, almost nobody actually uses content
negotiation, so I don't think that supports your case.

 I for one would rather not go to such trouble. Can you imagine going
 to every site you visit and specifying that you want XHTML instead of
 HTML, rather than just specifying
 application/xhtml+xml;q=1.0,text/html;q=0.0, in your request
 headers?

Well, no, because there's almost no functional difference between
XHTML and HTML except that the former is more likely to break due to
typos or minor bugs.  Plus, virtually no site actually provides both
XHTML and HTML.  Actually, virtually no site provides real XHTML at
all.  So I don't bother specifying a preference for either.  If you
do, I rather suspect that makes you one of a few hundred people at
most, out of billions of web users.  So maybe you could pick an
analogy that's more realistic?

On the other hand, every single video site already does allow you to
specify quality, and I've never had a problem with this.  It's a
simple control that's only there when you want it, and you can easily
figure out if you actually want higher or lower quality in any given
case.


[whatwg] Removing multiple attribute from input type=file multiple with selected files

2009-12-13 Thread TAMURA, Kent

What should happen to selected files in a case that a user selects multiple
files for input type=file multiple and then a script code removes the
multiple attribute from the input element?

 - nothing, no change to the selected files and they will be submitted,
 - cleared, or
 - a single file remains

--
TAMURA Kent
Software Engineer, Google






Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Tab Atkins Jr.
On Fri, Dec 11, 2009 at 10:18 PM, Michal Zalewski lcam...@coredump.cx wrote:
 1) IFRAME semantics make it exceedingly cumbersome to sandbox short
 snippets of text, and this task is perhaps the most common and
 pressing XSS-related challenge. Unless the document is constructed on
 client side by JavaScript, sites would need to use opaque data: URLs,
 or put up with a lot of additional HTTP roundtrips, to utilize
 sandboxed IFRAMEs for this purpose. [ There is also the problem of
 formatting and positioning IFRAME content, although the seamless
 attribute would fix this. ]

I believe that the @doc attribute, discussed in the original threads
about @sandbox, will be introduced to deal with that.  It'll take
plain html as a string, avoiding the opaqueness and larger escaping
requirements of a data:// url, as the only thing you'll have to escape
is whichever quote you're using to surround the value.


 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

Nah, token-guarding is no good.  For one it's completely unusable in
the XML serialization without edits to the XML spec.  More
importantly, though, it puts a significant burden on authors to
generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
their iframes or similar.  It's pretty much guaranteed that this
will happen, and as it has no visible bad effects until an attacker
gets through, I think it'll be reasonably common.


 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter? There are two cases where the mechanism still
 offers some protection:

You mean, if the attacker controls their own website on the same
origin and iframes it for themselves?  Sure, that's no protection.
Do you *need* protection then?  They're not on your site; if they can
get visitors onto their own site, they already have tons of more
effective ways to screw with users.  Unless I'm missing something
about this attack scenario, there's really nothing here.

Do you perhaps mean that the attacker puts an iframe in their own
comment or whatever, producing an iframe inside of an iframe
sandbox?  The outermost @sandbox should subdue the inner iframe's
power in the same way.


 It strikes me that this mechanism would make a whole lot more sense if
 supported on HTTP header level, instead: X-SOP-Sandbox: 1; in its
 current shape, it is defensible perhaps if aided by Mozilla's CSP.
 Otherwise, it's an error-prone detail, and we should at the very least
 outline why it's very difficult to get it right in the spec.

Again, I must admit some ignorance of the significance of this attack
scenario.  Surely if the attacker is pointing to an iframe in their
own code, they are either doing so within an iframe sandbox or are
doing so on their own site.  The former shouldn't be a problem, the
latter means that the attacker has full control over the contents
anyway, and can strip the header if they so choose.


 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for
 privacy reasons, for example in a web mail system.

I agree that this would be useful, especially for images.


 2.2) The ability to disable HTML parsing. On IFRAMEs, this can
 actually be approximated with the excommunicated plaintext tag, or
 with Content-Type: text/plain / data:text/plain,. On token-guarded
 SPANs or DIVs, however, it would be pretty damn useful for displaying
 text content without the need to escape , , , etc. Pure security
 benefit is limited, but as a phishing prevention and display
 correctness measure, it makes sense.

Why not just run an escape function over the content before sending
it?  All web languages have one specifically for escaping the
html-significant characters.  There's only five of them, after all.

~TJ


Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Tab Atkins Jr.
Wasn't there talk of adding a @media attribute to video which could,
among other things, hold bitrate information which would allow the UA
to auto-determine whether it should play a file?

This would require a change to the current selection algorithm, as the
UA now has to make a 'best guess' of which file to use rather than
just choosing the first which works, but it's probably worth it.
(Plus the other benefits of @media, such as declaring that a
particular source has subtitles burned in, etc.)

~TJ


Re: [whatwg] Removing multiple attribute from input type=file multiple with selected files

2009-12-13 Thread Jonas Sicking
On Sun, Dec 13, 2009 at 5:36 AM, TAMURA, Kent tk...@chromium.org wrote:
 What should happen to selected files in a case that a user selects multiple
 files for input type=file multiple and then a script code removes the
 multiple attribute from the input element?
  - nothing, no change to the selected files and they will be submitted,
  - cleared, or
  - a single file remains

I ran into the same question when developing this for firefox. I don't
really care what happens either way since I don't see a use case for
removing the multiple attribute.

What I ended up doing was to do your first option above.

/ Jonas


Re: [whatwg] Uploading directories of files

2009-12-13 Thread ddailey
Rereading comments 1 - 24 of 
https://bugzilla.mozilla.org/show_bug.cgi?id=143220  as cited below, reveals 
to me that I was not the only one in the past 7 years to encounter the many 
use cases (involving client-side access to local images). I was quite 
disappointed when it finally became disabled in all browsers. (circa 2006 in 
IE, I think).


Thanks to David B. for pointing out the history of some of the discussion 
within Mozilla and to Jonas for pointing out the demos of Firefox 3.6 using 
the new file API from http://www.w3.org/TR/FileAPI/ .


It looks like this will address my use cases (including a client-side 
animation studio and client-side image manipulation). Between the various 
proposed methods and canvas, it looks like one will be able to do lots of 
fancy stuff.


One question I have about the FileAPI: should I address it here or contact 
the group responsible at
public-weba...@w3.org ? That is while I see a method for grabbing blobs 
consecutive bytes from the image file, what might also be quite handy would 
be the old-fashioned Bit-blit sort of stuff. Specifically, it would be nice 
to deal with an arbitrary sub-rectangle of an image on the screen. 
Currently, I believe (correct me if I'm wrong), to use a clip region in 
either SVG or HTML of an image displayed in the browser (from a local or 
remote source), one has to essentially create a new version of the image 
(see either 
http://srufaculty.sru.edu/david.dailey/javascript/weave/weaver.htm or 
http://srufaculty.sru.edu/david.dailey/svg/clips2.svg ) to overlay clips. 
This is expensive in RAM and performance as the data in 
http://www.svgopen.org/2007/papers/BrowserPerformanceMeasures/index.html 
would indicate.


By being able to sample, for example a 20 pixel by 20 pixel sub-image of a 
rectangle as its own bitmap, would be considerably more efficient than the 
ways I have been doing this sort of thing. But if I understand the BLOB, it 
would be consecutive bytes from the actual file itself, as compressed in 
GIF, JPEG or PNG  (or SVG???) format?? That would be useful for certain 
kinds of image analysis, but for higher level image manipulation, having 
to deal with all those bytes in JavaScript would be much slower it seems 
than something implemented close to the onscreen (or offscreen) drawing.


I like the direction this is going though!

cheers
David

- Original Message - 
From: L. David Baron dba...@dbaron.org

To: Jeremy Orlow jor...@chromium.org
Cc: Markus Ernst derer...@gmx.ch; whatwg wha...@whatwg.org
Sent: Sunday, December 13, 2009 3:01 AM
Subject: Re: [whatwg] Uploading directories of files



On Friday 2009-12-11 02:17 -0800, Jeremy Orlow wrote:

But regardless.I don't think you could argue that having _some_ path
information is worse than _none_, right?


Many of those who commented in
https://bugzilla.mozilla.org/show_bug.cgi?id=143220 and its
duplicates would disagree.  Users may not expect the act of
uploading a file to give the Web site details of their file system
structure.  There also seems to be some concern that those details
may provide information useful to an attacker.

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation   http://www.mozilla.com/






Re: [whatwg] Web API for speech recognition and synthesis

2009-12-13 Thread Ian McGraw
I'm new to this list, but as a speech-scientist and web developer, I wanted
to add my 2 cents.  Personally, I believe the future of speech recognition
is in the cloud.

Here are two services which provide Javascript APIs for speech recognition
(and TTS) today:

http://wami.csail.mit.edu/
http://www.research.att.com/projects/SpeechMashup/index.html

Both of these are research systems, and as such they are really just
proof-of-concepts.
That said, Wami's JSONP-like implementation allows Quizlet.com to use speech
recognition today on a relatively large scale, with just a few lines of
Javascript code:

http://quizlet.com/voicetest/415/?scatter

Since there are a lot of Google folks on this list, I recommend you talk to
Alex Gruenstein (in your speech group) who was one of the lead developers of
WAMI while at MIT.

The major limitation we found when building the system was that we had to
develop a new audio controller for every client (Java for the desktop,
custom browsers for iPhone and Android).  It would have been much simpler if
browsers came with standard microphone capture and audio streaming
capabilities.

-Ian


On Sun, Dec 13, 2009 at 12:07 PM, Weston Ruter westonru...@gmail.comwrote:

 I blogged yesterday about this topic (including a text-to-speech demo using
 HTML5 Audio and Google Translate's TTS service); the more relevant part for
 this thread: http://weston.ruter.net/projects/google-tts/

 I am really excited at the prospect of text-to-speech being made available
 on
 the Web! It's just too bad that fetching MP3s on an remote web service is
 the
 only standard way of doing so currently; modern operating systems all have
 TTS
 capabilities, so it's a shame that web apps and can't utilize them via
 client-side scripting. I posted to the WHATWG mailing list about such a
 Text-To-Speech (TTS) Web API for JavaScript, and I was directed to a
 recent
 thread about a Web API for speech recognition and synthesis.

 Perhaps there is some momentum building here? Having TTS available in the
 browser would boost accessibility for the seeing-impaired and improve
 usability
 for people on-the-go. TTS is just another technology that has
 traditionally been
 relegated to desktop applications, but as the open Web advances as the
 preferred
 platform for application development, it is an essential service to make
 available (as with Geolocation API, Device API, etc.). And besides, I want
 to
 build TTS applications and my motto is: If it can't be done on the open
 web,
 it's not worth doing at all!


 http://weston.ruter.net/projects/google-tts/

 Weston

 On Fri, Dec 11, 2009 at 1:35 PM, Weston Ruter westonru...@gmail.comwrote:

 I was just alerted about this thread from my post Text-To-Speech (TTS)
 Web API for JavaScript at 
 http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html.
 Amazing how shared ideas like these seem to arise independently at the same
 time.

 I have a use-case and an additional requirement, that the time indices be
 made available for when each word is spoken in the TTS-generated audio:

 I've been working on a web app which reads text in a web page,
 highlighting each word as it is read. For this to be possible, a
 Text-To-Speech API is needed which is able to:
 (1) generate the speech audio from some text, and
 (2) include the time indicies for when each of the words in the text is
 spoken.


 I foresee that a TTS API should integrate closely with the HTML5 Audio
 API. For example, invoking a call to the API could return a TTS object
 which has an instance of Audio, whose interface could be used to navigate
 through the TTS output. For example:

 var tts = new TextToSpeech(Hello, World!);
 tts.audio.addEventListener(canplaythrough, function(e){
 //tts.indices == [{startTime:0, endTime:500, text:Hello},
 {startTime:500, endTime:1000, text:World}]
 }, false);
 tts.read(); //invokes tts.audio.play

 What would be even cooler, is if the parameter passed to the TextToSpeech
 constructor could be an Element or TextNode, and the indices would then
 include a DOM Range in addition to the text property. A flag could also be
 set which would result in each of these DOM ranges to be selected when it is
 read. For example:

 var tts = new TextToSpeech(document.querySelector(article));
 tts.selectRangesOnRead = true;
 tts.audio.addEventListener(canplaythrough, function(e){
 /*
 tts.indices == [
 {startTime:0, endTime:500, text:Hello, range:Range},
 {startTime:500, endTime:1000, text:World, range:Range}
 ]
 */
 }, false);
 tts.read();

 In addition to the events fired by the Audio API, more events could be
 fired when reading TTS, such as a readrange event whose event object would
 include the index (startTime, endTime, text, range) for the range currently
 being spoken. Such functionality would make the ability to read along with
 the text trivial.

 What do you think?
 Weston


 On Thu, Dec 3, 2009 at 4:06 AM, Bjorn Bringert 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 I believe that the @doc attribute, discussed in the original threads
 about @sandbox, will be introduced to deal with that.  It'll take
 plain html as a string, avoiding the opaqueness and larger escaping
 requirements of a data:// url, as the only thing you'll have to escape
 is whichever quote you're using to surround the value.

That doesn't strike me as a robust way to prevent XSS - the primary
reason why we need sandboxing to begin with is that people have a
difficulty properly parsing, serializing, or escaping HTML; so
replacing this with a mechanism that still requires escaping is
perhaps suboptimal.

 Nah, token-guarding is no good.  For one it's completely unusable in
 the XML serialization without edits to the XML spec.

This seems valid.

 More importantly, though, it puts a significant burden on authors to
 generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
 their iframes or similar.

People already  need to do this well for XSRF defenses to work, and
I'd wager it's a much simpler and better-defined problem than
real-world HTML parsing and escaping could realistically be. It is
also very easy to delegate this task to existing functions in common
web frameworks.

Also, a single token on a returned page, as long as it's unpredictable
across user sessions, should not be a significant issue.

 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter? There are two cases where the mechanism still
 offers some protection:

 You mean, if the attacker controls their own website on the same
 origin and iframes it for themselves?

The specific scenario given in the spec is:

pWe're not scared of you! Here is your content, unedited:/p
iframe sandbox src=getusercontent.cgi?id=12193/iframe

Let's say this is on example.com. What prevents evil.com from calling
http://example.com/getusercontent.cgi?id=12193 in an IFRAME? Assuming
that the author of evil.com is also the author of example.com user
content 12193, this renders all the benefits of using sandboxed
frames on example.com moot.

The only two cases where this threat is mitigated is when non-SOP
domains are used to serve user content (but in this case, if you're
doing it right, you don't really need iframe sandboxes that much); or
if id= is unpredictable (which in your own words, people are going to
mess up). And neither of these seem to be the case for the example
given.

 2.2) The ability to disable HTML parsing. On IFRAMEs, this can
 actually be approximated with the excommunicated plaintext tag, or
 with Content-Type: text/plain / data:text/plain,. On token-guarded
 SPANs or DIVs, however, it would be pretty damn useful for displaying
 text content without the need to escape , , , etc. Pure security
 benefit is limited, but as a phishing prevention and display
 correctness measure, it makes sense.

 Why not just run an escape function over the content before sending
 it?  All web languages have one specifically for escaping the
 html-significant characters.  There's only five of them, after all.

Well, indeed =) But xssed.com has 61,000 data points to the contrary.
The easier we make it for people to achieve exactly the result they
want, whilst avoiding security issues, the better. One of the common
tasks they fail at is rendering limited (neutered) HTML without any JS
in. This is not an unsolved task - quite a few implementations exist
to do it pretty well - but they still fail. The other, arguably more
common task - and the most common source of XSS flaws - is to display
user input without any HTML at all. So this fits in nicely, even if
simple to implement by other means.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 Nah, token-guarding is no good. [...] More importantly, though,
 it puts a significant burden on authors to generate unpredictable
 tokens.

Btw, just to clarify - I am not proposing this instead of the current
method; we could very well allow token-guarded sandboxing on divs /
spans, and sandboxing sans tokens on iframes, without making the
mechanism much more complicated or unintuitive. Iframes solve one
class of problems (mostly, sandboxing entire pages or larger blobs of
text, with certain performance and usability trade-offs); lightweight
divs / spans solve another (easy and low-cost sandboxing of small
snippets of user input) in a conceptually similar way.

If we do not address that second need, we are bound to see completely
different mechanisms emerge (such as the toStaticHTML variants), with
different semantics, security controls, and filtering granularity,
which I think is suboptimal. And since these mechanisms are limited to
JS, we may eventually see a third class of solutions emerge at some
point, which is really, all too reminiscent of the misery with 5 or so
flavors of SOP. So my general concern is this; token-guarded tags may
not be the best way to do it, but still.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 11:02 AM, Michal Zalewski lcam...@coredump.cx wrote:
 More importantly, though, it puts a significant burden on authors to
 generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
 their iframes or similar.

 People already  need to do this well for XSRF defenses to work, and
 I'd wager it's a much simpler and better-defined problem than
 real-world HTML parsing and escaping could realistically be. It is
 also very easy to delegate this task to existing functions in common
 web frameworks.

 Also, a single token on a returned page, as long as it's unpredictable
 across user sessions, should not be a significant issue.

People screw up CSRF tokens all the time.  The closing tag nonce
design has been floating around for years.  The earliest variant I
could find is Brendan's jail tag.

The @sandbox seems like a better fit for the advertising use case.  In
fact, many people have told me how happy they are that WebKit is
implementing @sandbox.  These folks tend to already be using iframes
to contain ads or gadgets and wish that they could turn off more
features, like frame-busting and plugins.  They're not worried about
the sandboxed content being loaded in the main frame because they're
interested in limiting the attacker's introduction to the user.  Once
the user has visited attacker.com, the issue is out of their hands.

I agree that we need something to help with content received by
cross-site XMLHttpRequest and postMessage.  For those use cases, we're
already running script, so a design like toStaticHTML seems better
than jail.

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 The @sandbox seems like a better fit for the advertising use case.

I am not contesting this, to be clear - I am aware of many cases where
it would be very useful - but gadgets are a fairly small part of the
Internet, and seems like a unified solution would be more desirable
than several very different APIs with different granularity.

The toStaticHTML-alike will address another specific uses, but will
leave applications that can't rely on JS exclusively for their
rendering needs (which I'd wager is still a majority) out in the cold;
which would probably lead to a yet another XSS prevention / HTML
sandboxing approach emerging later on.

I haven't really seen a compelling argument why all these can't be
unified without a significant increase in code or spec complexity -
maybe one exists.

More importantly, some of the features of @sandbox (e.g.,
allow-same-origin), as well as some of the examples in the spec, seem
to be explicitly targeted for other use cases, which makes me think
this is not the consensus between the authors; and the particular
same-origin user content example would promote highly unsafe coding
practices if ever followed. So it seems to me like such a narrow use
case is not even the consensus between authors?

Cheers,
/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 1:30 PM, Michal Zalewski lcam...@coredump.cx wrote:
 I haven't really seen a compelling argument why all these can't be
 unified without a significant increase in code or spec complexity -
 maybe one exists.

That seems like a backwards way of proceeding.  Do you have a proposal
for unification besides the jail tag?

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
[...sorry for splitting the response...]

 People screw up CSRF tokens all the time.  The closing tag nonce
 design has been floating around for years.  The earliest variant I
 could find is Brendan's jail tag.

Sure, I hinted it not as a brilliant new idea, but as a possibilty.

I do think giving it - or just anything more flexible as frames - as
an option should be relatively simple when seamless sandbox frames are
implemented, and that it would make it infinitely more useful in
places where it would arguably do much more good.

If the authors wish to restrict this model to a specific ad / gadget
use case, and consciously decided the costs of extending it to a more
general sandboxing appraoch outweigh the benefits, that's definitely
fine; but this is not evident. If so, we need to revise the spec to
make this clear, perhaps nuke features such as allow-same-origin
altogether, and definitely scrape examples such as:

pWe're not scared of you! Here is your content, unedited:/p
iframe sandbox src=getusercontent.cgi?id=12193/iframe

/mz



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 How do I use the jail tag to sandbox advertisements?

Huh? But that's not the point I am making... I am not arguing that
iframe sandbox should be abandoned as a bad idea - quite the opposite.

I was merely suggesting that we *expand* the same logic, and the same
excellent security control granularity, to span and div; this seems
like it would not increase the implementation complexity in any
significant way. We could then allow these to be populated with secure
contents in three ways:

1) Guarded closing tag - this is simple and bullet-proof; but may
conflict with XML serializations, and hence require some hacks,

2) CDATA or @doc-like approaches. Less secure because it does not
enforce a security control, but less contentious, and already being
considered for IFRAMEs.

3) .innerHTML, which would be then safe by default, without the need
for .innerSafeHTML (and the associated ambiguities) or explicit
.toStaticHTML calls.

This allows people to utilize the mechanism for so many more
additional use cases without the performance and usability cost of
IFRAMEs, and does not subvert the original ad / gadget use case in any
way.

*This* is what I find greatly preferred to having separate, completely
disjointed APIs with different semantics for ads / gadgets and other
full page contents, for small snippets of JS-inserted HTML, and for
server-returned data.

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

I am not suggesting this at all; extending the spec to cover, or at
least hint these cases would be a good idea. This is not to say the
functionality as currently speced out should be scraped. My points
were:

1) If we want to keep it limited to the ads / gadget case, we should
make it clear in the spec, reconsider the applicability of
allow-same-origin in this context, and definitely revise the as of now
unsafe getusercontent example, etc. I am not entirely sold that this
is a beneficial strategy in the long run, but as long as the
alternatives were considered, so be it.

2) If we want to make the implementation useful for other scenarios as
well, and avoid the proliferation of HTML-sandboxing APIs with
different security controls, we should still keep the spec mostly as
is, and I have no objection to implementations incorporating it; BUT
it would be beneficial to allow it to be extended as outlined above,
or in a similar general way, specifically making it easy to sandbox
inline HTML, and to place thousands of sandboxed containers on a page
without making the renderer implode.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
On Sun, Dec 13, 2009 at 2:00 PM, Adam Barth wha...@adambarth.com wrote:

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

And Adam - as you know, I have deep respect for your expertise and
contributions in this area, so please do not take this personally...
but this really strikes me as throwing random ideas at the wall, and
seeing which ones stick.

This is sometimes beneficial - but we are dealing with possibly
revolutionary security mechanisms in the browser, meant to counter one
of the most significant unsolved security challenges on the web. And
this mode of  engineering is probably why we have a different
same-origin policies for XMLHttpRequest, DOM  access, cookies,
third-party cookie setting, Flash, Java, Silverlight... plus assorted
ideas such as MSIE zones on top of it. It's the reason why their sum
is less secure than each of the mechanisms individually.

Still, this is not an attempt to dismiss the hard work: implementing
sandboxed IFRAMEs as-is and calling it a day *will* make the Internet
a safer place. But a collection of walled off, incompatible APIs with
different security switches and knobs, all of  them to perform a
common task, does strike me as suboptimal - and I do think it's
avoidable. Especially since, I am guessing, some of the pragmatic
objections to guarded tags were probably due to implementation
complexity or dubious usability, all of which are probably moot with
@sandbox in place.

Furthermore, in this particular case, I am really concerned that the
spec is at odds with itself - you mention certain specific use cases,
but the spec seems to be after a broader goal: sandboxing
user-supplied content in general. In doing so, it gives some bad
advice (again, the user content example is exploitable, at least until
the arrival of some out-of-scope security mechanism to prevent it).

I think I stated the concerns reasonably well earlier in the thread;
but if they sound unwarranted or inflammatory, I can admit a defeat.

Cheers,
/mz


Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Silvia Pfeiffer
There are many things that we would want to add to the source
element to allow for a better choice between the different source
files that are linked, but the biggest problem is that it is currently
only used to go through from top to bottom until the first file is
found that can be played back - then the source selection stops. Even
this is already quite a difficult algorithm.

Extending source to choose between alternative source files based on
other aspects such as quality, screen size, contains captions,
contains audio descriptions, etc. isn't going to work with the current
way that the source element is set up. This is why the @media
attribute hasnt' been used/implemented anywhere yet: it contradicts
the current algorithm for source selection. And in discussion with the
browser developers who have implemented the element I hear that it's
complex enough as it is and overloading the algorithm further is
impossible.

I've been wondering if there is another way.

The analogy with the source selection algorithm for mime types on a
server doesn't work well, because there is only one dimension upon
which to choose a source file: mime type. Here, we have several
dimensions, making any automated choice a challenge.

Cheers,
Silvia.

On Mon, Dec 14, 2009 at 2:28 AM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Wasn't there talk of adding a @media attribute to video which could,
 among other things, hold bitrate information which would allow the UA
 to auto-determine whether it should play a file?

 This would require a change to the current selection algorithm, as the
 UA now has to make a 'best guess' of which file to use rather than
 just choosing the first which works, but it's probably worth it.
 (Plus the other benefits of @media, such as declaring that a
 particular source has subtitles burned in, etc.)

 ~TJ



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 2:13 PM, Michal Zalewski lcam...@coredump.cx wrote:
 How do I use the jail tag to sandbox advertisements?

 Huh? But that's not the point I am making... I am not arguing that
 iframe sandbox should be abandoned as a bad idea - quite the opposite.

 I was merely suggesting that we *expand* the same logic, and the same
 excellent security control granularity, to span and div; this seems
 like it would not increase the implementation complexity in any
 significant way.

Implementation complexity is not the gating factor.  Implementing
canvas is orders of magnitude more complex than any of the proposals
we've seen so far.  The gating factor is discovering simple, robust
mechanisms that provide security for the key use cases.

 We could then allow these to be populated with secure
 contents in three ways:

 1) Guarded closing tag - this is simple and bullet-proof; but may
 conflict with XML serializations, and hence require some hacks,

 2) CDATA or @doc-like approaches. Less secure because it does not
 enforce a security control, but less contentious, and already being
 considered for IFRAMEs.

 3) .innerHTML, which would be then safe by default, without the need
 for .innerSafeHTML (and the associated ambiguities) or explicit
 .toStaticHTML calls.

 This allows people to utilize the mechanism for so many more
 additional use cases without the performance and usability cost of
 IFRAMEs, and does not subvert the original ad / gadget use case in any
 way.

 *This* is what I find greatly preferred to having separate, completely
 disjointed APIs with different semantics for ads / gadgets and other
 full page contents, for small snippets of JS-inserted HTML, and for
 server-returned data.

It sounds like you think we should proceed with @sandbox and also do
something with inline HTML.  Ian has already asked browser vendors to
experiment in these areas and try to gain some implementation
experience.  I'd encourage you to write up your thoughts in a brief
spec along the lines of the DOM-based HTML Sanitizer document I sent
to this list a while back.

I'm very interested in a solution that works for the following use cases:

1) A web page wants to display untrusted (i.e., restricted) HTML
received via cross-site XMLHttpRequest or postMessage.

2) A blog wishes to display many comments containing untrusted (i.e.,
restricted) HTML.

I'm certainly not married to my proposal.  In fact, I'm planning to
update it based on the feedback I've received here an elsewhere.

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

 I am not suggesting this at all; extending the spec to cover, or at
 least hint these cases would be a good idea. This is not to say the
 functionality as currently speced out should be scraped. My points
 were:

 1) If we want to keep it limited to the ads / gadget case, we should
 make it clear in the spec, reconsider the applicability of
 allow-same-origin in this context,

allow-same-origin is useful if the advertisement wishes to retrieve
additional information from its origin, e.g via XMLHttpRequest or the
video tag.  For example, the ad might want to show a video from its
origin and be able to interact with the video without the cross-origin
restrictions.

 and definitely revise the as of now unsafe getusercontent example, etc.

I agree what we should revise that example.

 I am not entirely sold that this
 is a beneficial strategy in the long run, but as long as the
 alternatives were considered, so be it.

 2) If we want to make the implementation useful for other scenarios as
 well, and avoid the proliferation of HTML-sandboxing APIs with
 different security controls, we should still keep the spec mostly as
 is, and I have no objection to implementations incorporating it; BUT
 it would be beneficial to allow it to be extended as outlined above,
 or in a similar general way, specifically making it easy to sandbox
 inline HTML, and to place thousands of sandboxed containers on a page
 without making the renderer implode.

These concerns seem to be with the implementation and not the spec.
We certainly can expand the web platform after HTML5.  I must be
misunderstanding something.

On Sun, Dec 13, 2009 at 2:31 PM, Michal Zalewski lcam...@coredump.cx wrote:
 And Adam - as you know, I have deep respect for your expertise and
 contributions in this area, so please do not take this personally...
 but this really strikes me as throwing random ideas at the wall, and
 seeing which ones stick.

 This is sometimes beneficial - but we are dealing with possibly
 revolutionary security mechanisms in the browser, meant to counter one
 of the most significant unsolved security challenges on the web.

I'm not sure its that revolutionary, but I'm glad you think it's important work.

 And this mode of  engineering is probably why we have a different
 same-origin policies for XMLHttpRequest, DOM  

Re: [whatwg] Uploading directories of files

2009-12-13 Thread イアンフェッティ
2009/12/11 Anne van Kesteren ann...@opera.com

 On Fri, 11 Dec 2009 15:24:37 +0100, Ian Fette (イアンフェッティ) 
 ife...@google.com wrote:

 Ok, I sense resistance to putting it in .name. What about .path, undefined
 in most cases except where there is an upload including files from
 multiple
 directories, in which case .path contains the path less any path
 components
 common to all 3 (sorry, it's early morning and I can't write well before
 having coffee).

 e.g.

 input.files[0].name=1.jpg
 input.files[0].path=a/b
 input.files[1].name=2.jpg
 input.files[1].path=a/b
 input.files[2].name=3.jpg
 input.files[2].path=a/c

 (Need to figure out the exact wording, as a is common to all 3 but if
 you're uploading the entire directory a, it may make sense to include
 that
 in the path -- but I don't feel quite as strongly about that -- subfolders
 are certainly more important IMO.)


 Note that extensions to File should be discussed on public-weba...@w3.org.
 At least, that's where they have been so far.



Anne -- happy to move the File related discussion to public-weba...@. For
the sake of wrapping up this thread though, are there there any changes that
would need to be made to the HTML spec to allow this behavior (including
limited path information in the name that gets sent when the form is
posted?) I can't seem to find the actual part of the spec that defines or
restricts what characters are valid in context.



 --
 Anne van Kesteren
 http://annevankesteren.nl/



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Aryeh Gregor
On Fri, Dec 11, 2009 at 11:18 PM, Michal Zalewski lcam...@coredump.cx wrote:
 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

Well, the problem this random token thing is trying to address is that
the untrusted content could just close the tag.  (I fondly remember my
days on Geocities, when we would add noscriptnoscript to the end
of our pages to try to get rid of the auto-injected ads.)  But it's
kind of hacky and might be prone to failure, and the syntax is really
unpleasant (especially for XML compatibility).

So instead, why not just use the standard escaping mechanisms we
already have?  Allow a sandbox attribute on all elements that can
contain phrasing or flow content.  Any such element with a sandbox
attribute will be required to contain no literal ' before the
closing tag.  If any of those four characters is encountered, the
element is treated as having no contents.  Otherwise, the browser
unescapes all characters with special meanings (lt; - , gt;
- , amp; - , etc.) and then treats the resulting string as
the inner HTML of the element, parsing it like regular HTML, but the
contents are sandboxed.

Examples:

span sandboxThis span will work normally, except for being sandboxed./span

span sandboxThis span will be emempty/em in the DOM, even though
it contains no evil content, because otherwise authors will forget to
escape the contents of the sandbox./span

span sandboxlt;spangt;But this span will have another span as its
child, sandboxed.  The regular parser sees no entities here, only a
nested span!lt;/spangt;/span

span sandboxIt would be safe to allow this to work, since it only
contains an apostrophe, but let's not, so that lack of escaping is
easier to catch.  This span is therefore also empty./span


I think this is easier to use than having to generate a random token,
and also more secure.  If your code isn't escaping things right,
you'll quickly notice when your blog comments all vanish.

This is even backward-compatible, in a certain sense.  jail would be
unsafe to serve with untrusted contents until all UAs reliably support
it.  This would be perfectly safe in all browsers, it would just
display poorly in old browsers if there's any HTML markup in the
content.

What do people think of this syntax?


Re: [whatwg] Uploading directories of files

2009-12-13 Thread Jonas Sicking
2009/12/13 Ian Fette (イアンフェッティ) ife...@google.com:
 2009/12/11 Anne van Kesteren ann...@opera.com

 On Fri, 11 Dec 2009 15:24:37 +0100, Ian Fette (イアンフェッティ)
 ife...@google.com wrote:

 Ok, I sense resistance to putting it in .name. What about .path,
 undefined
 in most cases except where there is an upload including files from
 multiple
 directories, in which case .path contains the path less any path
 components
 common to all 3 (sorry, it's early morning and I can't write well before
 having coffee).

 e.g.

 input.files[0].name=1.jpg
 input.files[0].path=a/b
 input.files[1].name=2.jpg
 input.files[1].path=a/b
 input.files[2].name=3.jpg
 input.files[2].path=a/c

 (Need to figure out the exact wording, as a is common to all 3 but if
 you're uploading the entire directory a, it may make sense to include
 that
 in the path -- but I don't feel quite as strongly about that --
 subfolders
 are certainly more important IMO.)

 Note that extensions to File should be discussed on public-weba...@w3.org.
 At least, that's where they have been so far.


 Anne -- happy to move the File related discussion to public-weba...@. For
 the sake of wrapping up this thread though, are there there any changes that
 would need to be made to the HTML spec to allow this behavior (including
 limited path information in the name that gets sent when the form is
 posted?)

The only change needed as far as I can tell is to say that *if* the
File objects contain any path information, that that path information
is included as part of the filename when the data is submitted.

The other change is to convince Arun that the File API spec should add
a .path property that contains (possibly partial) path information
when that is safe. Or some such.

However I would probably in general recommend that you follow the
steps outlined in the FAQ for new features:
http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F

There the next step would be to just add a implementation to chrome.

/ Jonas


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 span sandboxlt;spangt;But this span will have another span as its
 child, sandboxed.  The regular parser sees no entities here, only a
 nested span!lt;/spangt;/span

That's a pretty reasonable variant for lightweight sandboxes, IMO. It
does not have the explicit assurance of a token-based approach (i.e.,
will not fail right away if the user gets it wrong), but it's better
than data: URLs or @doc in that - as you noted - it will fail quickly
if the encapsulated HTML is not escaped, while this may still go
unnoticed until abused:

iframe sandbox doc=h1User input without escaping/iframe
iframe sandbox src=data:text/html,h1User input without escaping/iframe

As a side note, the other benefit of sandboxed spans and divs in such
a design is that you can then have .innerHTML on sandbox-tagged
elements automagically conform to the sandboxing rules, without the
need for .toStaticHTML, .secureInnerHTML, or similar approaches (which
are error-prone by the virtue of tying sanitization to data access
method, rather than a particular element).

/mz


Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Eric Carlson

On Dec 13, 2009, at 8:12 PM, Silvia Pfeiffer wrote:

 Oh! What are you doing with it? I mean - have the values in the media
 attribute any effect on the video element?
 
  Certainly! WebKit evaluates the query in the 'media' attribute if it believes 
it can handle the MIME type. If the query evaluates to true, it uses that 
source element. If it evaluates to false it skips it, even though it could 
(in theory) open the movie. For example, one of our layout tests [1] has the 
following :

video controls
source src=content/error.mpeg media=print
source src=content/error2.mpeg media=screen and (min-device-width: 
8px)
source src=content/test.mp4 media=screen and (min-device-width: 100px)
/video

  The test fails if the video element is instantiated with anything but 
test.mp4.

  I have seen 'media' used on real-world pages with something like the 
following to select different movies for the iphone and desktop:

video controls
source src='desktop-video.mp4' media=@media screen and (min-device-width: 
481px)
source src='iphone-video.mp4' media=@media screen and (min-device-width: 
480px)
/video

  This works because the source elements are evaluated in order, so the first 
one is selected on the desktop where both queries will evaluate to true.

eric

[1] 
http://trac.webkit.org/browser/trunk/LayoutTests/media/video-source-media.html?format=txt


 Thanks,
 Silvia.
 
 On Mon, Dec 14, 2009 at 2:43 PM, Eric Carlson eric.carl...@apple.com wrote:
 
 On Dec 13, 2009, at 2:35 PM, Silvia Pfeiffer wrote:
 
 This is why the @media attribute hasnt' been used/implemented anywhere yet
 
  Are you saying that nobody has implemented the media attribute on 
 source? If so, you are incorrect as WebKit has had this for almost two 
 years.
 
 eric
 
 



Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Silvia Pfeiffer
Ah that's excellent. I was under the impression that all
implementations so far are ignoring the media attribute in the
selection algorithm. But it seems I am mistaken. Do all browsers
implement this support then? And can we put the examples below into
the specification?

Indeed it seems to me the solution to the quality problem should
then be done through the media attribute. I am not sure yet how to,
because we have no definition for what a low quality or high
quality video is other than some form or SD vs HD and lower
resolution vs higher resolution and lower bandwidth vs higher
bandwidth.

Regards,
Silvia.

On Mon, Dec 14, 2009 at 4:12 PM, Eric Carlson eric.carl...@apple.com wrote:

 On Dec 13, 2009, at 8:12 PM, Silvia Pfeiffer wrote:

 Oh! What are you doing with it? I mean - have the values in the media
 attribute any effect on the video element?

  Certainly! WebKit evaluates the query in the 'media' attribute if it 
 believes it can handle the MIME type. If the query evaluates to true, it uses 
 that source element. If it evaluates to false it skips it, even though it 
 could (in theory) open the movie. For example, one of our layout tests [1] 
 has the following :

 video controls
    source src=content/error.mpeg media=print
    source src=content/error2.mpeg media=screen and (min-device-width: 
 8px)
    source src=content/test.mp4 media=screen and (min-device-width: 100px)
 /video

  The test fails if the video element is instantiated with anything but 
 test.mp4.

  I have seen 'media' used on real-world pages with something like the 
 following to select different movies for the iphone and desktop:

 video controls
    source src='desktop-video.mp4' media=@media screen and 
 (min-device-width: 481px)
    source src='iphone-video.mp4' media=@media screen and (min-device-width: 
 480px)
 /video

  This works because the source elements are evaluated in order, so the 
 first one is selected on the desktop where both queries will evaluate to true.

 eric

 [1] 
 http://trac.webkit.org/browser/trunk/LayoutTests/media/video-source-media.html?format=txt


 Thanks,
 Silvia.

 On Mon, Dec 14, 2009 at 2:43 PM, Eric Carlson eric.carl...@apple.com wrote:

 On Dec 13, 2009, at 2:35 PM, Silvia Pfeiffer wrote:

 This is why the @media attribute hasnt' been used/implemented anywhere yet

  Are you saying that nobody has implemented the media attribute on 
 source? If so, you are incorrect as WebKit has had this for almost two 
 years.

 eric






Re: [whatwg] Quality Values for Media Source Elements

2009-12-13 Thread Hugh Guiney
On Sun, Dec 13, 2009 at 7:26 AM, Aryeh Gregor simetrical+...@gmail.com wrote:
 JavaScript is an integral part of HTML to all intents and purposes.
 HTML itself does not and should not try to cover use-cases that are
 already adequately covered by HTML+JavaScript -- there will always be
 things that are better handled by a general-purpose scripting
 language.  Of course, moving something into HTML might be valuable
 because it makes the feature easier for authors to use, but that needs
 to be weighed against the cost of browsers having to implement it
 rather than some other feature.

JavaScript is a crutch that far too many applications are relying on
for major functionality lately. JavaScript should enhance a Web
experience, not supplant it.

 Well, yes.  On the other hand, almost nobody actually uses content
 negotiation, so I don't think that supports your case.

If no one uses content negotiation then there is no need to have the
source element at all.

 Well, no, because there's almost no functional difference between
 XHTML and HTML except that the former is more likely to break due to
 typos or minor bugs.  Plus, virtually no site actually provides both
 XHTML and HTML.  Actually, virtually no site provides real XHTML at
 all.  So I don't bother specifying a preference for either.  If you
 do, I rather suspect that makes you one of a few hundred people at
 most, out of billions of web users.  So maybe you could pick an
 analogy that's more realistic?

XHTML and HTML are interchangeable with any other two technologies in
that example. PDF and Word, HTML and RSS, RSS and XHTML... the point
isn't whether most site authors are offering those two in particular;
the point is that on a platform that supports content negotiation, it
makes no sense to outsource it to another technology, making authors
reinvent the wheel simply because not enough people are using wheels.

 On the other hand, every single video site already does allow you to
 specify quality, and I've never had a problem with this.  It's a
 simple control that's only there when you want it, and you can easily
 figure out if you actually want higher or lower quality in any given
 case.

It's simple for an end-user; not necessarily so simple for authors to implement.

On Sun, Dec 13, 2009 at 5:35 PM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 The analogy with the source selection algorithm for mime types on a
 server doesn't work well, because there is only one dimension upon
 which to choose a source file: mime type. Here, we have several
 dimensions, making any automated choice a challenge.

I do agree with this.

On Mon, Dec 14, 2009 at 12:12 AM, Eric Carlson eric.carl...@apple.com wrote:
  Certainly! WebKit evaluates the query in the 'media' attribute if it 
 believes it can handle the MIME type. If the query evaluates to true, it uses 
 that source element. If it evaluates to false it skips it, even though it 
 could (in theory) open the movie. For example, one of our layout tests [1] 
 has the following :

 video controls
source src=content/error.mpeg media=print
source src=content/error2.mpeg media=screen and (min-device-width: 
 8px)
source src=content/test.mp4 media=screen and (min-device-width: 100px)
 /video

  The test fails if the video element is instantiated with anything but 
 test.mp4.

This seems extremely useful. How many media features are implemented?

Currently, though, the CSS3 Media Query spec doesn't cover enough
metadata to make this as useful as it could be.

On Mon, Dec 14, 2009 at 1:54 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 Indeed it seems to me the solution to the quality problem should
 then be done through the media attribute. I am not sure yet how to,
 because we have no definition for what a low quality or high
 quality video is other than some form or SD vs HD and lower
 resolution vs higher resolution and lower bandwidth vs higher
 bandwidth.

Well, we could certainly define them as they'd be defined *today*, but
as HD becomes more and more commonplace it will effectively stop being
high definition, even if the name sticks. And, what one person
considers low quality, another person may consider high quality,
and vice-versa, depending on the capabilities of their machine and the
type of content they're used to seeing. Which is why it doesn't make
sense to specify absolutes, and why I proposed using relative values.

If we're to be more granular though, the biggest barrier to
implementation is the fact that, as you said, video is
multi-dimensional: there are MANY different factors that can affect
quality, viewing preference, and/or playback compatibility. Here is a
non-comprehensive list off the top of my head:

* Aspect Ratio (or Width and Height)
* Pixel Aspect Ratio (or Relative Pixel Width and Relative Pixel Height)
* Display Aspect Ratio (or AR / W  H and PAR / PW  PH)
* Content Aspect Ratio (or Content Width and Content Height)
* Sample Rate (or Rate and Scale)
* Bit