Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-04-06 Thread Ian Hickson
On Thu, 4 Feb 2010, Tab Atkins Jr. wrote:
 On Thu, Feb 4, 2010 at 5:12 AM, Ian Hickson i...@hixie.ch wrote:
  On Mon, 25 Jan 2010, Tab Atkins Jr. wrote:
 
  Adam Barth rightfully points out that this only stops certain classes 
  of data exfiltration attacks, and so probably isn't worthwhile as a 
  solution to that matter.  However, I think this would also be very 
  useful for general comments, to prevent, for example, shock trolls 
  from putting goatse images in your comment threads.  It would also 
  prevent video and audio embeds from working.
 
  However, it would still allow the site owner to allow particular 
  files to be embedded with img, audio, or video, if they just 
  host them on their own origin and set allow-same-origin in the 
  sandbox flags. This is already a relatively normal practice, but it's 
  accomplished through attempts at filtering.
 
  Note that this would also prevent resource embeds using data urls, as 
  they have a unique origin.
 
  It seems like if you want to control what markup is shown, the way to 
  do that is to to parse the input and remove the elements you want to 
  block. Just blocking off-domain images is a pretty poor way of 
  blocking images if that's what you want to do. Consider that the 
  commentor could just use table and td bgcolor to embed an image 
  if that's what he wants to do.
 
 Heh, if someone goes to the trouble of constructing a pixelmap out of a 
 table, they deserve to have it up until I find it and delete it. Note as 
 well that this sort of thing wouldn't be stopped by the suggested parse 
 and sanitize method either, unless you just want to strip *all* tables.  
 And pretty much all other HTML (using div/span with display:table-* 
 would accomplish the same thing, or just putting explicit heights/widths 
 on them to make the 'cells' line up.)

Indeed. If you want to block any of these things, you're much better off 
doing real filtering rather than relying on coarse feature blocking or 
cross-site blocking in the sandboxing feature, IMHO.


  On many large sites, users can upload images to one part of the site 
  -- those wouldn't be blocked either.
 
 That's the point - one wouldn't want those blocked (or if one did, one 
 could indeed filter all images out).  They can perhaps be more subject 
 to moderation (submit the image, and have to wait for it to be approved 
 before you can use it), or just be a built-in set of images that you're 
 allowed to use, like the large sets of smilies that most forums have.

This use case is seeming very obscure for a first version of the feature. 
Are there sites that try to do this today?


  Sorry, the title is unclear - I mainly intend this as preventing 
  audio autoplay and the like.  Any sort of action that could be 
  both annoying and would take place without the user's consent.  This 
  is inherently ill-defined, which may be a problem, but it could be 
  tightened up to say precisely which features should be shut down. 
  It might need to be revised as new features get added, though.
 
  Yeah, maybe we should do this. Are there any other than autoplay, 
  autofocus, meta refresh, and script?
 
 That's all I can think of from a quick scan of the list of elements.

I've blocked those.


  Are there other reasonable improvements that could be made to iframe 
  sandbox to make it more suitable for wrapping things such as blog 
  comments?  Ideally, production-level sites with relatively normal 
  requirements should be able to use *solely* iframe sandbox to 
  protect their users from untrusted content.  (Though, of course, it 
  would be only a part of the site's defenses until the userbase with 
  non-supporting browsers drops low enough to ignore.)  Do others 
  believe this is an achievable goal, or conversely believe it is not?
 
  sandbox= is only meant as an extra defence-in-depth, it's really not 
  meant as a self-contained comprehensive security mechanism.
 
 Eh, once we can rely on it being implemented, it seems like it *could* 
 be a fairly self-contained security mechanism.  At the very least, it 
 could shut down the most worrying of attacks, and allow manual 
 moderation to take care of the rest.  Filling in a last few holes would 
 finish this out.

You're always going to need things like the text/html-sandboxed type, 
etc, as far as I can tell.


  Shelley Powers states that she disallows SVG in the comments on her 
  blog because of the risk of someone DOSing her users by writing 
  highly resource-intensive SVG.  This could be fixed in a general 
  sense by having the ability to opt into very strict resource limits 
  per iframe - it the limit is exceeded, the browser would simply bail 
  and end processing in that iframe.  I'm not certain how practical 
  this is from an engineering standpoint, however.  There's no need to 
  set precise limits on this - each browser should understand the 
  platform it's running on well enough to know what an 'appropriate' 
  resource 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-05 Thread Kornel Lesinski
On 4 Feb 2010, at 17:44, Michal Zalewski wrote:
 
 If there's no HTML, there's no need for a sandbox, so the simplest
 solution is just to escape the s and s.
 
 Which people fail at, big time. There are 50,000+ entries on
 xssed.com, many of them against big sites presumably developed by
 skilled developers with the help of sophisticated frameworks -
 microsoft.com, google.com, amazon.com, ebay.com, etc. It is a really
 low-effort tweak to accommodate it here, and it may offer a very
 significant security benefit, so...?

The problem comes from lack of escaping of any kind, so change in escaping 
method will not fix the problem, i.e.,

Hello $unescaped_name

is as vulnerable as:

Hello iframe sandbox srcdoc=$unescaped_name

 I think the difference is huge; in a typical web framework, you need
 to explicitly escape every single piece of potentially dangerous
 attacker-controlled input to stay safe - and because people tend to do
 some validation at input stage, it's very difficult to audit for it.
 Escaping also needs to be very different depending on the context
 (URLs need to be encoded differently than HTML parameters, and
 differently than standalone text).
 
 So even though your framework may provide several escape() functions,
 it's very easy to get it wrong, and people constantly do. OTOH, if
 your framework provides a get_token() function, there's really not
 that much to using it properly.

That's problem with the frameworks (a big one, admittedly). However, there are 
templating engines that escape all variables, everywhere, by default, and this 
solves the problem very well.

Addition of token-based sandbox won't improve anything in cases where authors 
forget to escape or wrongly assume that input is already filtered/escaped or 
harmless. If someone forgets to add escape(), why would they remember to add 
sandbox? Additionally sandbox will cause new security problem in all 
current UAs, so for plain text I don't see any benefit at all.

However, if we're going to introduce token-based sandbox anyway, I suggest 
putting token in tag name:

sandbox-$token.../sandbox-$token

where $token is the random part. This avoids oddity of attributes in closing 
tag, and is compatible with XML. In XML you could also use:

$token:sandbox xmlns:$token=…/$token:sandbox

-- 
regards, Kornel Lesiński



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-05 Thread Lachlan Hunt

Kornel Lesinski wrote:

However, if we're going to introduce token-based sandbox anyway, I
suggest putting token in tag name:

sandbox-$token.../sandbox-$token

where $token is the random part. This avoids oddity of attributes in
closing tag, and is compatible with XML. In XML you could also use:

$token:sandbox xmlns:$token=…/$token:sandbox


No, you couldn't use a namespace like that, because then the sandbox 
element would not be in the HTML namespace, and thus would not have any 
known semantics.


--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-05 Thread Lachlan Hunt

Lachlan Hunt wrote:

Kornel Lesinski wrote:

However, if we're going to introduce token-based sandbox anyway, I
suggest putting token in tag name:

sandbox-$token.../sandbox-$token

where $token is the random part. This avoids oddity of attributes in
closing tag, and is compatible with XML. In XML you could also use:

$token:sandbox xmlns:$token=…/$token:sandbox


No, you couldn't use a namespace like that, because then the sandbox
element would not be in the HTML namespace, and thus would not have any
known semantics.


Um, ignore me.  I'm just not thinking properly today.

--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-05 Thread Kornel Lesinski
On 5 Feb 2010, at 14:19, Lachlan Hunt wrote:

 where $token is the random part. This avoids oddity of attributes in
 closing tag, and is compatible with XML. In XML you could also use:
 
 $token:sandbox xmlns:$token=…/$token:sandbox
 
 No, you couldn't use a namespace like that, because then the sandbox element 
 would not be in the HTML namespace, and thus would not have any known 
 semantics.


Eh, I've left out namespace URI, because I don't like to type it. Here's 
complete example that applies HTML semantics:

div xmlns=http://www.w3.org/1999/xhtml;
  dd02c7c2232759874e1c205587017bed:sandbox 
xmlns:dd02c7c2232759874e1c205587017bed=http://www.w3.org/1999/xhtml;
  h1HTML/h1
 /dd02c7c2232759874e1c205587017bed:sandbox
/div

-- 
regards, Kornel



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-05 Thread Philip Taylor
On Thu, Feb 4, 2010 at 11:12 AM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 25 Jan 2010, Alex Russell wrote:

 AFAICT, the objections fall into several buckets:

   1.) Users might pick badly or may re-use nonces when they shouldn't.
   2.) Escaping  is believed to be more secure because it's likely to
 break more often, raising developer awareness
   3.) The fix to correct escaping problems is believed to be more reliable

 I'm interested in 2 and 3. Users will do dumb things, and both 2 and 3
 assumes a similar baseline scenario as 1; a developer did something
 dumb. Nonces need not be cryptographically strong for most apps, so
 the big problem is re-use. UA's have broad leeway here to prevent
 re-use on origins and deny sandboxing to containers that re-use the
 same nonces on a single page. They can even help by keeping a list of
 recently used nonces and denying reuse.

 Could you elaborate on how one could avoid reuse? That seems like a bad
 idea, since it would prevent any non-client caching mechanism from
 working. The problem is not nonce re-use, it's that the token has to be
 either unpredictable or unspoofable. (It could be predictable and
 unspoofable if it was constructed using a diagonal of the user's text.)

Seems like it should be easy to get secure tokens by doing:

  $token = sha512_hex($input);
  print sandbox token=$token$input/sandbox token=$token;

(or whatever the sandbox syntax is), so there's no need to worry about
cryptographically secure RNGs or nonces or reuse or caching problems.
Is this what you meant by a diagonal of the user's text?

(I'm assuming here that the UA treats the token as an opaque blob, it
doesn't try to recompute the hash itself, so it's robust to changes in
character encoding etc. People could still choose insecure tokens
instead, but it's pretty trivial to use the hash solution correctly in
most programming environments (easier than good random numbers). To
attack it, you'd have to pick two strings X and Y and a hash H such
that hash(X+/sandbox token=+H++Y) = H, which for a good hash
function should be hard, I think.)

-- 
Philip Taylor
exc...@gmail.com


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-04 Thread Ian Hickson

On Mon, 25 Jan 2010, Michal Zalewski wrote:
  
  This has been proposed before. The concern is that many authors would 
  be likely to make mistakes in their selection of random tokens that 
  would lead to significant flaws in the deployment of the feature.
 
  srcdoc= is less prone to errors. Only  and  characters need to be 
  escaped. If the  character is not escaped, then a single  character 
  in the input will cause the comment to break.
 
 My counterargument, as stated later in the thread, is quite simple: the 
 former *forces* you to implement a security mechanism, else the 
 functionality will break. You can still use a bad token, but you are 
 required to make the effort.
 
 In that regard, the comparison to XSRF is probably not valid; a vast 
 majority of XSRF bugs occurs not because people pick poor tokens (in 
 fact, that's really a majority), but because they don't use them at all. 
 From that perspectiv, srcdoc=... is very similar to XSRF - people will 
 mess it up simply by not thinking about the correct escaping.

Not escaping  is so easily and quickly discovered that I really don't 
think that's a problem. The difference between that and XSRF is that in 
the XSRF case, things usually work pretty well (better than well, in fact, 
even HTML is supported!). It's only if an attacker makes use of the hole 
that the side-effect is highlighted.

The idea here is to align what is needed for correctness and what is 
needed for security, rather than having them be separate. In that way it's 
quite similar to the token idea, except I think it's far more likely to be 
done securely -- actually picking a truly unpredictable token is a 
non-trivial exercise.


 That said, I am not really arguing against srcdoc=...; I think it's an 
 OK feature. My point is simply that I would love to see less 
 fragmentation when it comes to XSS defenses and the granularity of 
 security controls. The initial proposal of iframe sandboxes solved a 
 very narrow use case, and other, unrelated designs started to spring up 
 elsewhere. This wouldn't be bad by itself, but while the security 
 controls on iframes were pretty great (with some tweaks, such as 
 text/html-sandboxed), they would not be reflected in other APIs, which I 
 thought is unfortunate.
 
 If we extend sandboxed iframes with srcdoc, seamless frames, 
 text/html-sandboxed, and iframe rendering performance improvements, it 
 actually becomes close to a comprehensive solution, and I am happy with 
 this (other than a vague feeling that we just repurposed iframe to be 
 some sort of a span ;-).

Well, it's different from span because it has its own browsing context 
-- which is basically exactly what iframe is.


  I've introduced text/html-sandboxed for this purpose.
 
 1) Some other security mechanisms (CORS, anti-clickjacking controls, XSS 
 filter controls) rely on separate HTTP headers instead. Is there a 
 compelling reason not to follow that lead - or better yet, to unify all 
 security headers to conserve space?

We need something that breaks legacy UAs.


 2) People may conceivably want to sandbox other document types (e.g., 
 SVG, RSS, or other XML-based formats rendered natively, and offering 
 scripting capabilities). Do we want to create -sandboxed MIME types 
 for each? The header approach would fix this, too.

If people really use XML, we can add an equivalent MIME type for XML. 
However, we should only do so if that's really required.


  2.1) The ability to disable loading of external resources (images, 
  scripts, etc) in the sandboxed document. The common usage scenario is 
  when you do not want the displayed document to phone home for 
  privacy reasons, for example in a web mail system.
 
  Good point. Should we make sandbox= disable off-origin network 
  requests?
 
 That would be great. I think Adam proposed we have a separate 
 sandbox=... toggle for this. Whether it's on or off by default 
 probably doesn't matter much.

Adam's feedback (not quoted here, but in the same thread as the e-mail to 
which this is a reply) suggests that this is actually a bad idea, so I've 
not changed this.


  2.2) The ability to disable HTML parsing. [...]
 
 One use case is a web forum or a web mail interface where you want to 
 display a message, but specifically don't want HTML formatting. Or, 
 performance permitting, the same could be used for any text-only entry 
 fields displayed on a page.

If there's no HTML, there's no need for a sandbox, so the simplest 
solution is just to escape the s and s. That's even easier than 
srcdoc= (since there you have to have the iframe, and also have to 
escape s).


  Do people get CSRF right more often than simply escaping characters? 
  It seems implausible that authors get complex cryptographic properties 
  right more often than a simple set of substitutions, but I suppose 
  stranger things are true on the Web.
 
 Keep in mind that pretty much every web application already needs to 
 safely generate 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-04 Thread Tab Atkins Jr.
On Thu, Feb 4, 2010 at 5:12 AM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 25 Jan 2010, Tab Atkins Jr. wrote:

 Adam Barth rightfully points out that this only stops certain classes
 of data exfiltration attacks, and so probably isn't worthwhile as a
 solution to that matter.  However, I think this would also be very
 useful for general comments, to prevent, for example, shock trolls
 from putting goatse images in your comment threads.  It would also
 prevent video and audio embeds from working.

 However, it would still allow the site owner to allow particular files
 to be embedded with img, audio, or video, if they just host them
 on their own origin and set allow-same-origin in the sandbox flags.
 This is already a relatively normal practice, but it's accomplished
 through attempts at filtering.

 Note that this would also prevent resource embeds using data urls, as
 they have a unique origin.

 It seems like if you want to control what markup is shown, the way to do
 that is to to parse the input and remove the elements you want to block.
 Just blocking off-domain images is a pretty poor way of blocking images if
 that's what you want to do. Consider that the commentor could just use
 table and td bgcolor to embed an image if that's what he wants to
 do.

Heh, if someone goes to the trouble of constructing a pixelmap out of
a table, they deserve to have it up until I find it and delete it.
Note as well that this sort of thing wouldn't be stopped by the
suggested parse and sanitize method either, unless you just want to
strip *all* tables.  And pretty much all other HTML (using div/span
with display:table-* would accomplish the same thing, or just putting
explicit heights/widths on them to make the 'cells' line up.)

 On many large sites, users can upload images to one part of the site
 -- those wouldn't be blocked either.

That's the point - one wouldn't want those blocked (or if one did, one
could indeed filter all images out).  They can perhaps be more subject
to moderation (submit the image, and have to wait for it to be
approved before you can use it), or just be a built-in set of images
that you're allowed to use, like the large sets of smilies that most
forums have.

 Sorry, the title is unclear - I mainly intend this as preventing
 audio autoplay and the like.  Any sort of action that could be both
 annoying and would take place without the user's consent.  This is
 inherently ill-defined, which may be a problem, but it could be
 tightened up to say precisely which features should be shut down.  It
 might need to be revised as new features get added, though.

 Yeah, maybe we should do this. Are there any other than autoplay,
 autofocus, meta refresh, and script?

That's all I can think of from a quick scan of the list of elements.

 Are there other reasonable improvements that could be made to iframe
 sandbox to make it more suitable for wrapping things such as blog
 comments?  Ideally, production-level sites with relatively normal
 requirements should be able to use *solely* iframe sandbox to protect
 their users from untrusted content.  (Though, of course, it would be
 only a part of the site's defenses until the userbase with
 non-supporting browsers drops low enough to ignore.)  Do others believe
 this is an achievable goal, or conversely believe it is not?

 sandbox= is only meant as an extra defence-in-depth, it's really not
 meant as a self-contained comprehensive security mechanism.

Eh, once we can rely on it being implemented, it seems like it *could*
be a fairly self-contained security mechanism.  At the very least, it
could shut down the most worrying of attacks, and allow manual
moderation to take care of the rest.  Filling in a last few holes
would finish this out.

 Shelley Powers states that she disallows SVG in the comments on her
 blog because of the risk of someone DOSing her users by writing highly
 resource-intensive SVG.  This could be fixed in a general sense by
 having the ability to opt into very strict resource limits per iframe
 - it the limit is exceeded, the browser would simply bail and end
 processing in that iframe.  I'm not certain how practical this is from
 an engineering standpoint, however.  There's no need to set precise
 limits on this - each browser should understand the platform it's
 running on well enough to know what an 'appropriate' resource amount
 is for this sort of thing.  Phones would cut off iframes much sooner
 than a desktop, a browser might take advantage of system load
 information to dynamically alter its cutoff point, etc.

 The spec already allows arbitrary limits. I dunno what else we could
 really do.

Allows them, yes, but browsers often don't cut things off as quickly
as one would like, likely out of a reasonable thought that authors
know roughly what they're doing, and blocking something big would stop
too many legitimate resource-heavy usecases.

This is intended to give browsers an indication that, yes, they really
*should* cut 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-04 Thread Michal Zalewski
 Not escaping  is so easily and quickly discovered that I really don't
 think that's a problem.

The same argument could be made for not escaping , but I don't think
it's valid in practice - particularly for (hypothetically) constrained
input fields.

 That would be great. I think Adam proposed we have a separate
 sandbox=... toggle for this. Whether it's on or off by default
 probably doesn't matter much.
 Adam's feedback (not quoted here, but in the same thread as the e-mail to
 which this is a reply) suggests that this is actually a bad idea, so I've
 not changed this.

There are obvious, existing usage cases where sites struggle to
prevent automated resource loading across domains - e.g., almost every
HTML-supporting mail client; so it strikes me that if we go along with
this reasoning because a perfect solution may not exist, we're also
effectively saying that what they are doing should not be attempted at
all (then what's the alternative? one should probably be a part of
HTML5).

 If there's no HTML, there's no need for a sandbox, so the simplest
 solution is just to escape the s and s.

Which people fail at, big time. There are 50,000+ entries on
xssed.com, many of them against big sites presumably developed by
skilled developers with the help of sophisticated frameworks -
microsoft.com, google.com, amazon.com, ebay.com, etc. It is a really
low-effort tweak to accommodate it here, and it may offer a very
significant security benefit, so...?

 Keep in mind that pretty much every web application already needs to
 safely generate unique, unpredictable tokens - for session identifiers
 that guard authenticated sessions. If they can't get it right, they are
 hosed anyway - but problems here are not horribly common, in my
 experience at least, and web app frameworks do a decent job of helping
 developers by providing token-generating facilities.

 Pretty much the same can be said of escaping text.

 Also, based on Adam's comments, it seems that things aren't really as rosy
 as all that for token generators.

I think the difference is huge; in a typical web framework, you need
to explicitly escape every single piece of potentially dangerous
attacker-controlled input to stay safe - and because people tend to do
some validation at input stage, it's very difficult to audit for it.
Escaping also needs to be very different depending on the context
(URLs need to be encoded differently than HTML parameters, and
differently than standalone text).

So even though your framework may provide several escape() functions,
it's very easy to get it wrong, and people constantly do. OTOH, if
your framework provides a get_token() function, there's really not
that much to using it properly.

I'm coming from a background of doing lots of security reviews for
existing applications, so while I appreciate that the difference may
be subtle, the real-world error patterns speak to me pretty strongly;
and I do think that insufficient escaping is drastically more common
than used, but insufficiently unpredictable XSRF tokens.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-02-04 Thread Aryeh Gregor
On Thu, Feb 4, 2010 at 12:44 PM, Michal Zalewski lcam...@coredump.cx wrote:
 The same argument could be made for not escaping , but I don't think
 it's valid in practice - particularly for (hypothetically) constrained
 input fields.

The use-cases for srcdoc are only where you expect HTML input.  HTML
input is very likely to contain  or '.  By contrast, ordinary XSS
usually occurs when  is unlikely to occur in legitimate input, so you
won't spot it right away -- as you say, constrained input fields.  Why
would anyone, even someone who's extremely confused and/or ignorant,
even *attempt* to use srcdoc to contain anything other than HTML?


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Aryeh Gregor
On Mon, Jan 25, 2010 at 1:29 AM, Adam Barth wha...@adambarth.com wrote:
 That depends what information the attacker encodes in the host name.
 Recall that we're imaging the attacker gets to run JavaScript within
 the sandbox

If we're assuming that, then yes, it's probably hopeless.  But are we
assuming that?  The given use-case was webmail -- that would be
expected to disable scripts in the sandbox, no?

 The point is that stopping exfiltration is a losing battle that we
 shouldn't bother to play.

Even if scripting is disabled?


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Adam Barth
On Mon, Jan 25, 2010 at 5:39 PM, Aryeh Gregor simetrical+...@gmail.com wrote:
 On Mon, Jan 25, 2010 at 1:29 AM, Adam Barth wha...@adambarth.com wrote:
 That depends what information the attacker encodes in the host name.
 Recall that we're imaging the attacker gets to run JavaScript within
 the sandbox

 If we're assuming that, then yes, it's probably hopeless.  But are we
 assuming that?  The given use-case was webmail -- that would be
 expected to disable scripts in the sandbox, no?

 The point is that stopping exfiltration is a losing battle that we
 shouldn't bother to play.

 Even if scripting is disabled?

Blocking exfiltration has a long history on the web.  In fact, the
first security model for the web, before the same-origin policy, was
based on stopping exfiltration.  Ultimately, Netscape gave up on that
approach and tried the same-origin policy, which is what we're still
using today.  More recently, there have been some academic papers
studying the idea of preventing exfiltration after XSS attacks,
including some prototype implementations.  None of the implementations
I'm aware of have had their security claims stand up to close
scrutiny.

Of course, none of that means it would be impossible to add a security
feature to the web based on blocking exfiltration.  If that's
something you're passionate about, I'd encourage you to build a
prototype system by modifying one of the open source browsers.  If you
find a clever way of doing that, there are a number of folks in the
academic would who would love to hear how.  In particular, the Web 2.0
Security  Privacy Workshop might be a good venue to share your
findings:

http://w2spconf.com/2010/

That venue is particularly inviting to papers written by
non-academics.  You can see some of the papers from previous years to
get a feel for the style, etc.

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Michal Zalewski
 I've introduced srcdoc= to largely handle this. There is an example in
 the spec showing how it can be used.

Yup, sounds good.

 This has been proposed before. The concern is that many authors would be
 likely to make mistakes in their selection of random tokens that would
 lead to significant flaws in the deployment of the feature.

 srcdoc= is less prone to errors. Only  and  characters need to be
 escaped. If the  character is not escaped, then a single  character in
 the input will cause the comment to break.

My counterargument, as stated later in the thread, is quite simple:
the former *forces* you to implement a security mechanism, else the
functionality will break. You can still use a bad token, but you are
required to make the effort.

In that regard, the comparison to XSRF is probably not valid; a vast
majority of XSRF bugs occurs not because people pick poor tokens (in
fact, that's really a majority), but because they don't use them at
all. From that perspectiv, srcdoc=... is very similar to XSRF -
people will mess it up simply by not thinking about the correct
escaping.

That said, I am not really arguing against srcdoc=...; I think it's
an OK feature. My point is simply that I would love to see less
fragmentation when it comes to XSS defenses and the granularity of
security controls. The initial proposal of iframe sandboxes solved a
very narrow use case, and other, unrelated designs started to spring
up elsewhere. This wouldn't be bad by itself, but while the security
controls on iframes were pretty great (with some tweaks, such as
text/html-sandboxed), they would not be reflected in other APIs, which
I thought is unfortunate.

If we extend sandboxed iframes with srcdoc, seamless frames,
text/html-sandboxed, and iframe rendering performance improvements,
it actually becomes close to a comprehensive solution, and I am happy
with this (other than a vague feeling that we just repurposed iframe
to be some sort of a span ;-).

 I've introduced text/html-sandboxed for this purpose.

Yup, I noticed. Looks great. It does make me wonder about two things, though:

1) Some other security mechanisms (CORS, anti-clickjacking controls,
XSS filter controls) rely on separate HTTP headers instead. Is there a
compelling reason not to follow that lead - or better yet, to unify
all security headers to conserve space?

2) People may conceivably want to sandbox other document types (e.g.,
SVG, RSS, or other XML-based formats rendered natively, and offering
scripting capabilities). Do we want to create -sandboxed MIME types
for each? The header approach would fix this, too.

 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for privacy
 reasons, for example in a web mail system.

 Good point. Should we make sandbox= disable off-origin network requests?

That would be great. I think Adam proposed we have a separate
sandbox=... toggle for this. Whether it's on or off by default
probably doesn't matter much.

 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually
 be approximated with the excommunicated plaintext tag, or with
 Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or
 DIVs, however, it would be pretty damn useful for displaying text
 content without the need to escape , , , etc. Pure security benefit
 is limited, but as a phishing prevention and display correctness
 measure, it makes sense.

 I don't really understand the use case here; could you elaborate?

One use case is a web forum or a web mail interface where you want to
display a message, but specifically don't want HTML formatting. Or,
performance permitting, the same could be used for any text-only entry
fields displayed on a page. These are common XSS vectors, and they are
not readily addressed by sandboxed iframe + srcdoc=..., because
this will not render as expected:

User's favorite smiley face is: iframe srcdoc=lt;Ogt;_lt;Ogt;/iframe

Having a drop-in solution for this would be pretty nice, and very easy
to implement, too: just force text/plain, do not sniff.

 Do people get CSRF right more often than simply escaping characters? It
 seems implausible that authors get complex cryptographic properties right
 more often than a simple set of substitutions, but I suppose stranger
 things are true on the Web.

Keep in mind that pretty much every web application already needs to
safely generate unique, unpredictable tokens - for session identifiers
that guard authenticated sessions. If they can't get it right, they
are hosed anyway - but problems here are not horribly common, in my
experience at least, and web app frameworks do a decent job of helping
developers by providing token-generating facilities.

As noted earlier, the vast majority of issues with XSS and XSRF
defenses is that you explicitly need to think about them, and a
failure to do so has no obvious side 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Michal Zalewski
 The reason to use a MIME type here is to trick legacy browsers into
 not rendering the response as HTML.

Legacy browsers probably will, anyway :-(

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Tab Atkins Jr.
On Mon, Jan 25, 2010 at 1:51 PM, Michal Zalewski lcam...@coredump.cx wrote:
 This has been proposed before. The concern is that many authors would be
 likely to make mistakes in their selection of random tokens that would
 lead to significant flaws in the deployment of the feature.

 srcdoc= is less prone to errors. Only  and  characters need to be
 escaped. If the  character is not escaped, then a single  character in
 the input will cause the comment to break.

 My counterargument, as stated later in the thread, is quite simple:
 the former *forces* you to implement a security mechanism, else the
 functionality will break. You can still use a bad token, but you are
 required to make the effort.

Ah, but the devil is in the details.

Of the two escaping requirements for @srcdoc, only escaping  is
required for security reasons. (Forgetting to escape  will just
result in spurious entities sometimes, but no security issues.)
However, use of  in comments should be reasonably common, and if it
is left unescaped, will immediately truncate the content there (the
rest of the comment will attempt to be interpreted as attributes or
other elements).

Thus, if you fail to escape your s, it should fail *quickly* and
*obviously* on *innocuous* content.  As well, as soon as it happens,
the obvious fix is the correct one.

On the other hand, getting your token-generation wrong will only fail
when someone guesses your guard token and attacks your site.  Ordinary
comments will still work just fine.

 If we extend sandboxed iframes with srcdoc, seamless frames,
 text/html-sandboxed, and iframe rendering performance improvements,
 it actually becomes close to a comprehensive solution, and I am happy
 with this (other than a vague feeling that we just repurposed iframe
 to be some sort of a span ;-).

In the end that's sorta what we're doing.  ^_^

 1) Some other security mechanisms (CORS, anti-clickjacking controls,
 XSS filter controls) rely on separate HTTP headers instead. Is there a
 compelling reason not to follow that lead - or better yet, to unify
 all security headers to conserve space?

HTTP headers won't cause the content to fail in browsers that don't
understand them.  Mimetypes will in at least *some* legacy browsers.
I know that some versions of IE do content-sniffing for HTML-like
content.

 2) People may conceivably want to sandbox other document types (e.g.,
 SVG, RSS, or other XML-based formats rendered natively, and offering
 scripting capabilities). Do we want to create -sandboxed MIME types
 for each? The header approach would fix this, too.

Possibly.  Are those document types going to be rendered in any way?
SVG can be sent with an HTML mimetype now, at least.

 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for privacy
 reasons, for example in a web mail system.

 Good point. Should we make sandbox= disable off-origin network requests?

 That would be great. I think Adam proposed we have a separate
 sandbox=... toggle for this. Whether it's on or off by default
 probably doesn't matter much.

I think this is a good idea.  Adam argues against it being effective
for preventing exfiltration, but it's also useful for the common
use-case of disabling images in comments.  This would also prevent
people using video or audio in comments.  It would still allow the
site author to allow self-hosted files to be used with any of these
tags, but it would protect from, say, goatse trolls.

 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually
 be approximated with the excommunicated plaintext tag, or with
 Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or
 DIVs, however, it would be pretty damn useful for displaying text
 content without the need to escape , , , etc. Pure security benefit
 is limited, but as a phishing prevention and display correctness
 measure, it makes sense.

 I don't really understand the use case here; could you elaborate?

 One use case is a web forum or a web mail interface where you want to
 display a message, but specifically don't want HTML formatting. Or,
 performance permitting, the same could be used for any text-only entry
 fields displayed on a page. These are common XSS vectors, and they are
 not readily addressed by sandboxed iframe + srcdoc=..., because
 this will not render as expected:

 User's favorite smiley face is: iframe srcdoc=lt;Ogt;_lt;Ogt;/iframe

 Having a drop-in solution for this would be pretty nice, and very easy
 to implement, too: just force text/plain, do not sniff.

I agree with this as well.  I very commonly have inputs that should
take plaintext only, not html.

 span[server-sanitized string]/span
 iframe srcdoc=[server-escaped string]/iframe
 span guard=[token][any string]/span guard=[token]

 The first two options will not immediately fail if you forget about or
 mess up 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Tab Atkins Jr.
Michal Zalewski brings up several good suggestions for improvements to
@sandbox that would make it more useful for embedding general
untrusted user content.  As well, Shelley Powers brought up a few
common uses that I think could fit into this model and prove useful.

1) Prevent cross-origin resource loads
--
Adam Barth rightfully points out that this only stops certain classes
of data exfiltration attacks, and so probably isn't worthwhile as a
solution to that matter.  However, I think this would also be very
useful for general comments, to prevent, for example, shock trolls
from putting goatse images in your comment threads.  It would also
prevent video and audio embeds from working.

However, it would still allow the site owner to allow particular files
to be embedded with img, audio, or video, if they just host them
on their own origin and set allow-same-origin in the sandbox flags.
This is already a relatively normal practice, but it's accomplished
through attempts at filtering.

Note that this would also prevent resource embeds using data urls, as
they have a unique origin.

2) Prevent all HTML parsing (rendering as text/plain)
-
I think it's pretty common for certain areas of a comment form, such
as username, email, or title, to be meant as ordinary plaintext
without any special formatting allowed.  Right now that means you have
to run html escapes over the content, which isn't difficult.  Would it
be appropriate to move this into sandbox as well, though, to make it
even easier?

3) Prevent no-input actions
---
Sorry, the title is unclear - I mainly intend this as preventing
audio autoplay and the like.  Any sort of action that could be both
annoying and would take place without the user's consent.  This is
inherently ill-defined, which may be a problem, but it could be
tightened up to say precisely which features should be shut down.  It
might need to be revised as new features get added, though.

4) Stricter resource limits
---
Shelley Powers states that she disallows SVG in the comments on her
blog because of the risk of someone DOSing her users by writing highly
resource-intensive SVG.  This could be fixed in a general sense by
having the ability to opt into very strict resource limits per iframe
- it the limit is exceeded, the browser would simply bail and end
processing in that iframe.  I'm not certain how practical this is from
an engineering standpoint, however.  There's no need to set precise
limits on this - each browser should understand the platform it's
running on well enough to know what an 'appropriate' resource amount
is for this sort of thing.  Phones would cut off iframes much sooner
than a desktop, a browser might take advantage of system load
information to dynamically alter its cutoff point, etc.


Are there other reasonable improvements that could be made to iframe
sandbox to make it more suitable for wrapping things such as blog
comments?  Ideally, production-level sites with relatively normal
requirements should be able to use *solely* iframe sandbox to
protect their users from untrusted content.  (Though, of course, it
would be only a part of the site's defenses until the userbase with
non-supporting browsers drops low enough to ignore.)  Do others
believe this is an achievable goal, or conversely believe it is not?

~TJ


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Alex Russell
On Sun, Jan 24, 2010 at 2:52 AM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 11 Dec 2009, Michal Zalewski wrote:

 1) IFRAME semantics make it exceedingly cumbersome to sandbox short
 snippets of text, and this task is perhaps the most common and pressing
 XSS-related challenge. Unless the document is constructed on client side
 by JavaScript, sites would need to use opaque data: URLs, or put up with
 a lot of additional HTTP roundtrips, to utilize sandboxed IFRAMEs for
 this purpose. [ There is also the problem of formatting and positioning
 IFRAME content, although the seamless attribute would fix this. ]

 I've introduced srcdoc= to largely handle this. There is an example in
 the spec showing how it can be used.


 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

 This has been proposed before. The concern is that many authors would be
 likely to make mistakes in their selection of random tokens that would
 lead to significant flaws in the deployment of the feature.

 srcdoc= is less prone to errors. Only  and  characters need to be
 escaped. If the  character is not escaped, then a single  character in
 the input will cause the comment to break. This is likely to be caught
 early. If the  character is not escaped, correctness and fidelity will
 suffer, but it will not lead to security errors.

Sorry I'm late to this discussion. Would like to add my objection to
using attribute string escaping as a security feature in any way. I
strongly prefer required nonces attached to opening and closing of
sections.

 2) Renderers suck dealing with IFRAMEs, and will probably continue to
 do so for time being. This means that a typical, moderately complex
 application (say, as a discussion forum or a social site), where
 hundreds of user-controlled strings may need to be present to display
 user content - the mechanism would have an unacceptable load time and
 memory footprint. In fact, people are already coming up with
 lightweight alternatives with a significant functionality overlap (and
 different security controls). Microsoft has toStaticHTML(), while a
 standardized implementation is being discussed here right now in a
 separate thread.

 I agree that we should investigate other options too (iframe boxes
 aren't suitable for everything), but I don't think that current
 implementation problems with iframe should necessarily prevent us from
 investigating sandboxed iframes too.

 In certain contexts, e.g. reddit comments, it may be the case that instead
 of one sandboxed iframe per comment, the best way to do things is
 instead one sandboxed iframe for all the comments, with scripts disabled
 and allow-same-origin enabled, so that scripts can poke into the page and
 set event handlers on all the relevant links.


 Isn't the benefit of keeping the design slightly simpler (and
 realistically, limited to relatively few usage scenarios) negated by the
 fact that alternative solutions to other narrow problems would need to
 emerge elsewhere? The browser coming with several different script
 sanitizers with completely different APIs and security controls does not
 strike me as a desirable outcome (all the flavors of SOP are a testament
 to this). If the anser is not a strong no, maybe the token-guarded DIV
 / SPAN approach is a better alternative?

 I agree in principle that fewer features are better than more features,
 but we have to take into account that many of the people deploying these
 features know nothing about security. We have to ensure that the security
 aspects of features like this (like what to escape, what security tokens
 need to be generated) are aligned with the practical aspects of features
 like this (like what results in the page appearing to work, regardless of
 the state of security).


 Now, that aside - on a more pragmatic level, I have two extra comments:

 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter?

 I've introduced text/html-sandboxed for this purpose.


 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for privacy
 reasons, for example in a web mail system.

 Good point. Should we make sandbox= disable off-origin network requests?


 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually
 be approximated with the excommunicated plaintext tag, or with
 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Tab Atkins Jr.
On Mon, Jan 25, 2010 at 5:45 PM, Alex Russell slightly...@google.com wrote:
 Sorry I'm late to this discussion. Would like to add my objection to
 using attribute string escaping as a security feature in any way. I
 strongly prefer required nonces attached to opening and closing of
 sections.

Do you have any suggestions on how to fix the issues that have already
been raised against that?

~TJ


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-25 Thread Alex Russell
AFAICT, the objections fall into several buckets:

  1.) Users might pick badly or may re-use nonces when they shouldn't.
  2.) Escaping  is believed to be more secure because it's likely to
break more often, raising developer awareness
  3.) The fix to correct escaping problems is believed to be more reliable

I'm interested in 2 and 3. Users will do dumb things, and both 2 and 3
assumes a similar baseline scenario as 1; a developer did something
dumb. Nonces need not be cryptographically strong for most apps, so
the big problem is re-use. UA's have broad leeway here to prevent
re-use on origins and deny sandboxing to containers that re-use the
same nonces on a single page. They can even help by keeping a list of
recently used nonces and denying reuse.

What concerns me about the  escaping option is that it's harder to
implement by default. Perhaps you see that as a benefit, but if part
of the goal is to raise the average, then allowing markup that can
surround existing DOM structures and secure them easily surely beats
trying to help every web developer understand that stuffing semantic
content into an attribute sausage casing (but don't do it wrong!) is a
good thing.

I'm also looking at this problem from the perspective of ways to help
speed up pages. Iframes have the useful and desirable quality that
they don't load and execute resources synchronously with the parent
document's DOM. For most sorts of ads, widgets, and embedded x-domain
3rd party content, the synchronous nature of mashup content creates
enormous performance problems. Google ads, Facebook Connect badges and
login containers...it's all much slower than it needs to be.

Along with Steve Souders, Adam Barth, and a few others I've been
discussing options for retrofitting content to make an iframe-like
container that participates inline in the current document but which
loads and executes content asynchronously from the perspective of the
main document's content. Getting users to use -- and then secure --
such a container seems to me to be significantly easier sell if the
opening discussion doesn't begin with first, take your document
fragment and do the moral equiavlent of base64 encoding.

In fact, I'd argue that base64 with a length header might be a simpler
and easier way to handle arbitrary content in attributes. Violations
of the length would make parsing problems even more visible than in
the escaped  case while the default amount of work to do it right
would remain unchanged.

If the security of the system depends on users correctly
pre-processing their content, then I'd like to suggest that we should
be more explicit about it and not accept the -escaping half measure.
The other option (which I favor) is to not pretend that we can have it
both ways and give users who explictly opt into security features
enough credit to have thought about their use for even a moment.

Regards

On Mon, Jan 25, 2010 at 3:47 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 On Mon, Jan 25, 2010 at 5:45 PM, Alex Russell slightly...@google.com wrote:
 Sorry I'm late to this discussion. Would like to add my objection to
 using attribute string escaping as a security feature in any way. I
 strongly prefer required nonces attached to opening and closing of
 sections.

 Do you have any suggestions on how to fix the issues that have already
 been raised against that?

 ~TJ



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-24 Thread Ian Hickson
On Fri, 11 Dec 2009, Michal Zalewski wrote:
 
 1) IFRAME semantics make it exceedingly cumbersome to sandbox short 
 snippets of text, and this task is perhaps the most common and pressing 
 XSS-related challenge. Unless the document is constructed on client side 
 by JavaScript, sites would need to use opaque data: URLs, or put up with 
 a lot of additional HTTP roundtrips, to utilize sandboxed IFRAMEs for 
 this purpose. [ There is also the problem of formatting and positioning 
 IFRAME content, although the seamless attribute would fix this. ]

I've introduced srcdoc= to largely handle this. There is an example in 
the spec showing how it can be used.


 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

This has been proposed before. The concern is that many authors would be 
likely to make mistakes in their selection of random tokens that would 
lead to significant flaws in the deployment of the feature.

srcdoc= is less prone to errors. Only  and  characters need to be 
escaped. If the  character is not escaped, then a single  character in 
the input will cause the comment to break. This is likely to be caught 
early. If the  character is not escaped, correctness and fidelity will 
suffer, but it will not lead to security errors.


 2) Renderers suck dealing with IFRAMEs, and will probably continue to
 do so for time being. This means that a typical, moderately complex
 application (say, as a discussion forum or a social site), where
 hundreds of user-controlled strings may need to be present to display
 user content - the mechanism would have an unacceptable load time and
 memory footprint. In fact, people are already coming up with
 lightweight alternatives with a significant functionality overlap (and
 different security controls). Microsoft has toStaticHTML(), while a
 standardized implementation is being discussed here right now in a
 separate thread.

I agree that we should investigate other options too (iframe boxes 
aren't suitable for everything), but I don't think that current 
implementation problems with iframe should necessarily prevent us from 
investigating sandboxed iframes too.

In certain contexts, e.g. reddit comments, it may be the case that instead 
of one sandboxed iframe per comment, the best way to do things is 
instead one sandboxed iframe for all the comments, with scripts disabled 
and allow-same-origin enabled, so that scripts can poke into the page and 
set event handlers on all the relevant links.


 Isn't the benefit of keeping the design slightly simpler (and 
 realistically, limited to relatively few usage scenarios) negated by the 
 fact that alternative solutions to other narrow problems would need to 
 emerge elsewhere? The browser coming with several different script 
 sanitizers with completely different APIs and security controls does not 
 strike me as a desirable outcome (all the flavors of SOP are a testament 
 to this). If the anser is not a strong no, maybe the token-guarded DIV 
 / SPAN approach is a better alternative?

I agree in principle that fewer features are better than more features, 
but we have to take into account that many of the people deploying these 
features know nothing about security. We have to ensure that the security 
aspects of features like this (like what to escape, what security tokens 
need to be generated) are aligned with the practical aspects of features 
like this (like what results in the page appearing to work, regardless of 
the state of security).


 Now, that aside - on a more pragmatic level, I have two extra comments:
 
 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter?

I've introduced text/html-sandboxed for this purpose.


 2.1) The ability to disable loading of external resources (images, 
 scripts, etc) in the sandboxed document. The common usage scenario is 
 when you do not want the displayed document to phone home for privacy 
 reasons, for example in a web mail system.

Good point. Should we make sandbox= disable off-origin network requests?


 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually 
 be approximated with the excommunicated plaintext tag, or with 
 Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or 
 DIVs, however, it would be pretty damn useful for displaying text 
 content without the need to escape , , , etc. Pure security benefit 
 is limited, but as a phishing prevention and display 

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-24 Thread Adam Barth
On Sun, Jan 24, 2010 at 11:52 AM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 11 Dec 2009, Michal Zalewski wrote:
 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for privacy
 reasons, for example in a web mail system.

 Good point. Should we make sandbox= disable off-origin network requests?

In general, stopping malicious content from exfiltrating data isn't
practical.  For example, even including a single hyperlink is often
sufficient to exfiltrate a large amount of data.  In user agents that
prefetch DNS, the user doesn't even need to click on the link.

 On Sun, 13 Dec 2009, Adam Barth wrote:
 I'm very interested in a solution that works for the following use
 cases:

 1) A web page wants to display untrusted (i.e., restricted) HTML
 received via cross-site XMLHttpRequest or postMessage.

 Do you have a concrete use case for which iframe doesn't work?

iframe sandbox srcdoc might work nicely for this use case, actually,
especially because setting srcdoc from the DOM removes the need to
escape .

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-24 Thread Ian Hickson
On Sun, 24 Jan 2010, Adam Barth wrote:
 On Sun, Jan 24, 2010 at 11:52 AM, Ian Hickson i...@hixie.ch wrote:
  On Fri, 11 Dec 2009, Michal Zalewski wrote:
  2.1) The ability to disable loading of external resources (images, 
  scripts, etc) in the sandboxed document. The common usage scenario is 
  when you do not want the displayed document to phone home for 
  privacy reasons, for example in a web mail system.
 
  Good point. Should we make sandbox= disable off-origin network 
  requests?
 
 In general, stopping malicious content from exfiltrating data isn't 
 practical.  For example, even including a single hyperlink is often 
 sufficient to exfiltrate a large amount of data.  In user agents that 
 prefetch DNS, the user doesn't even need to click on the link.

Ok. Then I won't add it.


  On Sun, 13 Dec 2009, Adam Barth wrote:
  I'm very interested in a solution that works for the following use
  cases:
 
  1) A web page wants to display untrusted (i.e., restricted) HTML
  received via cross-site XMLHttpRequest or postMessage.
 
  Do you have a concrete use case for which iframe doesn't work?
 
 iframe sandbox srcdoc might work nicely for this use case, actually,
 especially because setting srcdoc from the DOM removes the need to
 escape .

Cool.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2010-01-24 Thread Aryeh Gregor
On Sun, Jan 24, 2010 at 5:52 AM, Ian Hickson i...@hixie.ch wrote:
 What would the sandbox do, other than require one level of escaping?
 i.e. what is it protecting against?

span sandbox$something/sandbox was meant to be more or less the
same as iframe sandbox srcdoc=$something.  The latter achieves the
same effect but is cleaner and makes more sense.  I must not have
known about the doc= proposal at that point, but I can't remember
what I was thinking more than a month ago.

On Sun, Jan 24, 2010 at 6:19 AM, Adam Barth wha...@adambarth.com wrote:
 In general, stopping malicious content from exfiltrating data isn't
 practical.  For example, even including a single hyperlink is often
 sufficient to exfiltrate a large amount of data.  In user agents that
 prefetch DNS, the user doesn't even need to click on the link.

DNS prefetching doesn't tell you anything except that someone viewed
the link, right?  And maybe what their ISP is, in a typical case.
Including an image tells you their IP address, User-Agent, and so on.

How can you get any data out of a link with no DNS prefetching?  Some
users will click the link, but not all.  Maybe quite a lot if you
allow arbitrary CSS, of course . . . you could easily make the whole
post a link.  But everyone who clicks on a given post for some
reason is still a lot less than all viewers, which is what image
inclusions will do.


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Tab Atkins Jr.
On Fri, Dec 11, 2009 at 10:18 PM, Michal Zalewski lcam...@coredump.cx wrote:
 1) IFRAME semantics make it exceedingly cumbersome to sandbox short
 snippets of text, and this task is perhaps the most common and
 pressing XSS-related challenge. Unless the document is constructed on
 client side by JavaScript, sites would need to use opaque data: URLs,
 or put up with a lot of additional HTTP roundtrips, to utilize
 sandboxed IFRAMEs for this purpose. [ There is also the problem of
 formatting and positioning IFRAME content, although the seamless
 attribute would fix this. ]

I believe that the @doc attribute, discussed in the original threads
about @sandbox, will be introduced to deal with that.  It'll take
plain html as a string, avoiding the opaqueness and larger escaping
requirements of a data:// url, as the only thing you'll have to escape
is whichever quote you're using to surround the value.


 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

Nah, token-guarding is no good.  For one it's completely unusable in
the XML serialization without edits to the XML spec.  More
importantly, though, it puts a significant burden on authors to
generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
their iframes or similar.  It's pretty much guaranteed that this
will happen, and as it has no visible bad effects until an attacker
gets through, I think it'll be reasonably common.


 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter? There are two cases where the mechanism still
 offers some protection:

You mean, if the attacker controls their own website on the same
origin and iframes it for themselves?  Sure, that's no protection.
Do you *need* protection then?  They're not on your site; if they can
get visitors onto their own site, they already have tons of more
effective ways to screw with users.  Unless I'm missing something
about this attack scenario, there's really nothing here.

Do you perhaps mean that the attacker puts an iframe in their own
comment or whatever, producing an iframe inside of an iframe
sandbox?  The outermost @sandbox should subdue the inner iframe's
power in the same way.


 It strikes me that this mechanism would make a whole lot more sense if
 supported on HTTP header level, instead: X-SOP-Sandbox: 1; in its
 current shape, it is defensible perhaps if aided by Mozilla's CSP.
 Otherwise, it's an error-prone detail, and we should at the very least
 outline why it's very difficult to get it right in the spec.

Again, I must admit some ignorance of the significance of this attack
scenario.  Surely if the attacker is pointing to an iframe in their
own code, they are either doing so within an iframe sandbox or are
doing so on their own site.  The former shouldn't be a problem, the
latter means that the attacker has full control over the contents
anyway, and can strip the header if they so choose.


 2.1) The ability to disable loading of external resources (images,
 scripts, etc) in the sandboxed document. The common usage scenario is
 when you do not want the displayed document to phone home for
 privacy reasons, for example in a web mail system.

I agree that this would be useful, especially for images.


 2.2) The ability to disable HTML parsing. On IFRAMEs, this can
 actually be approximated with the excommunicated plaintext tag, or
 with Content-Type: text/plain / data:text/plain,. On token-guarded
 SPANs or DIVs, however, it would be pretty damn useful for displaying
 text content without the need to escape , , , etc. Pure security
 benefit is limited, but as a phishing prevention and display
 correctness measure, it makes sense.

Why not just run an escape function over the content before sending
it?  All web languages have one specifically for escaping the
html-significant characters.  There's only five of them, after all.

~TJ


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 I believe that the @doc attribute, discussed in the original threads
 about @sandbox, will be introduced to deal with that.  It'll take
 plain html as a string, avoiding the opaqueness and larger escaping
 requirements of a data:// url, as the only thing you'll have to escape
 is whichever quote you're using to surround the value.

That doesn't strike me as a robust way to prevent XSS - the primary
reason why we need sandboxing to begin with is that people have a
difficulty properly parsing, serializing, or escaping HTML; so
replacing this with a mechanism that still requires escaping is
perhaps suboptimal.

 Nah, token-guarding is no good.  For one it's completely unusable in
 the XML serialization without edits to the XML spec.

This seems valid.

 More importantly, though, it puts a significant burden on authors to
 generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
 their iframes or similar.

People already  need to do this well for XSRF defenses to work, and
I'd wager it's a much simpler and better-defined problem than
real-world HTML parsing and escaping could realistically be. It is
also very easy to delegate this task to existing functions in common
web frameworks.

Also, a single token on a returned page, as long as it's unpredictable
across user sessions, should not be a significant issue.

 1) The utility of the SOP sandboxing behavior outlined in the spec is
 diminished if we have no way to actually *enforce* that the IFRAMEd
 resource would only be rendered in such a context. If I am serving
 user-supplied, unsanitized HTML, it is obviously safe to do iframe
 sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
 attacker from calling http://my_site/show.cgi?id=1234 directly, and
 bypassing the filter? There are two cases where the mechanism still
 offers some protection:

 You mean, if the attacker controls their own website on the same
 origin and iframes it for themselves?

The specific scenario given in the spec is:

pWe're not scared of you! Here is your content, unedited:/p
iframe sandbox src=getusercontent.cgi?id=12193/iframe

Let's say this is on example.com. What prevents evil.com from calling
http://example.com/getusercontent.cgi?id=12193 in an IFRAME? Assuming
that the author of evil.com is also the author of example.com user
content 12193, this renders all the benefits of using sandboxed
frames on example.com moot.

The only two cases where this threat is mitigated is when non-SOP
domains are used to serve user content (but in this case, if you're
doing it right, you don't really need iframe sandboxes that much); or
if id= is unpredictable (which in your own words, people are going to
mess up). And neither of these seem to be the case for the example
given.

 2.2) The ability to disable HTML parsing. On IFRAMEs, this can
 actually be approximated with the excommunicated plaintext tag, or
 with Content-Type: text/plain / data:text/plain,. On token-guarded
 SPANs or DIVs, however, it would be pretty damn useful for displaying
 text content without the need to escape , , , etc. Pure security
 benefit is limited, but as a phishing prevention and display
 correctness measure, it makes sense.

 Why not just run an escape function over the content before sending
 it?  All web languages have one specifically for escaping the
 html-significant characters.  There's only five of them, after all.

Well, indeed =) But xssed.com has 61,000 data points to the contrary.
The easier we make it for people to achieve exactly the result they
want, whilst avoiding security issues, the better. One of the common
tasks they fail at is rendering limited (neutered) HTML without any JS
in. This is not an unsolved task - quite a few implementations exist
to do it pretty well - but they still fail. The other, arguably more
common task - and the most common source of XSS flaws - is to display
user input without any HTML at all. So this fits in nicely, even if
simple to implement by other means.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 Nah, token-guarding is no good. [...] More importantly, though,
 it puts a significant burden on authors to generate unpredictable
 tokens.

Btw, just to clarify - I am not proposing this instead of the current
method; we could very well allow token-guarded sandboxing on divs /
spans, and sandboxing sans tokens on iframes, without making the
mechanism much more complicated or unintuitive. Iframes solve one
class of problems (mostly, sandboxing entire pages or larger blobs of
text, with certain performance and usability trade-offs); lightweight
divs / spans solve another (easy and low-cost sandboxing of small
snippets of user input) in a conceptually similar way.

If we do not address that second need, we are bound to see completely
different mechanisms emerge (such as the toStaticHTML variants), with
different semantics, security controls, and filtering granularity,
which I think is suboptimal. And since these mechanisms are limited to
JS, we may eventually see a third class of solutions emerge at some
point, which is really, all too reminiscent of the misery with 5 or so
flavors of SOP. So my general concern is this; token-guarded tags may
not be the best way to do it, but still.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 11:02 AM, Michal Zalewski lcam...@coredump.cx wrote:
 More importantly, though, it puts a significant burden on authors to
 generate unpredictable tokens.  Is this difficult?  No, of course not.
 But people *will* do it badly, copypasting a single token in all
 their iframes or similar.

 People already  need to do this well for XSRF defenses to work, and
 I'd wager it's a much simpler and better-defined problem than
 real-world HTML parsing and escaping could realistically be. It is
 also very easy to delegate this task to existing functions in common
 web frameworks.

 Also, a single token on a returned page, as long as it's unpredictable
 across user sessions, should not be a significant issue.

People screw up CSRF tokens all the time.  The closing tag nonce
design has been floating around for years.  The earliest variant I
could find is Brendan's jail tag.

The @sandbox seems like a better fit for the advertising use case.  In
fact, many people have told me how happy they are that WebKit is
implementing @sandbox.  These folks tend to already be using iframes
to contain ads or gadgets and wish that they could turn off more
features, like frame-busting and plugins.  They're not worried about
the sandboxed content being loaded in the main frame because they're
interested in limiting the attacker's introduction to the user.  Once
the user has visited attacker.com, the issue is out of their hands.

I agree that we need something to help with content received by
cross-site XMLHttpRequest and postMessage.  For those use cases, we're
already running script, so a design like toStaticHTML seems better
than jail.

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 The @sandbox seems like a better fit for the advertising use case.

I am not contesting this, to be clear - I am aware of many cases where
it would be very useful - but gadgets are a fairly small part of the
Internet, and seems like a unified solution would be more desirable
than several very different APIs with different granularity.

The toStaticHTML-alike will address another specific uses, but will
leave applications that can't rely on JS exclusively for their
rendering needs (which I'd wager is still a majority) out in the cold;
which would probably lead to a yet another XSS prevention / HTML
sandboxing approach emerging later on.

I haven't really seen a compelling argument why all these can't be
unified without a significant increase in code or spec complexity -
maybe one exists.

More importantly, some of the features of @sandbox (e.g.,
allow-same-origin), as well as some of the examples in the spec, seem
to be explicitly targeted for other use cases, which makes me think
this is not the consensus between the authors; and the particular
same-origin user content example would promote highly unsafe coding
practices if ever followed. So it seems to me like such a narrow use
case is not even the consensus between authors?

Cheers,
/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 1:30 PM, Michal Zalewski lcam...@coredump.cx wrote:
 I haven't really seen a compelling argument why all these can't be
 unified without a significant increase in code or spec complexity -
 maybe one exists.

That seems like a backwards way of proceeding.  Do you have a proposal
for unification besides the jail tag?

Adam


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
[...sorry for splitting the response...]

 People screw up CSRF tokens all the time.  The closing tag nonce
 design has been floating around for years.  The earliest variant I
 could find is Brendan's jail tag.

Sure, I hinted it not as a brilliant new idea, but as a possibilty.

I do think giving it - or just anything more flexible as frames - as
an option should be relatively simple when seamless sandbox frames are
implemented, and that it would make it infinitely more useful in
places where it would arguably do much more good.

If the authors wish to restrict this model to a specific ad / gadget
use case, and consciously decided the costs of extending it to a more
general sandboxing appraoch outweigh the benefits, that's definitely
fine; but this is not evident. If so, we need to revise the spec to
make this clear, perhaps nuke features such as allow-same-origin
altogether, and definitely scrape examples such as:

pWe're not scared of you! Here is your content, unedited:/p
iframe sandbox src=getusercontent.cgi?id=12193/iframe

/mz



Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 How do I use the jail tag to sandbox advertisements?

Huh? But that's not the point I am making... I am not arguing that
iframe sandbox should be abandoned as a bad idea - quite the opposite.

I was merely suggesting that we *expand* the same logic, and the same
excellent security control granularity, to span and div; this seems
like it would not increase the implementation complexity in any
significant way. We could then allow these to be populated with secure
contents in three ways:

1) Guarded closing tag - this is simple and bullet-proof; but may
conflict with XML serializations, and hence require some hacks,

2) CDATA or @doc-like approaches. Less secure because it does not
enforce a security control, but less contentious, and already being
considered for IFRAMEs.

3) .innerHTML, which would be then safe by default, without the need
for .innerSafeHTML (and the associated ambiguities) or explicit
.toStaticHTML calls.

This allows people to utilize the mechanism for so many more
additional use cases without the performance and usability cost of
IFRAMEs, and does not subvert the original ad / gadget use case in any
way.

*This* is what I find greatly preferred to having separate, completely
disjointed APIs with different semantics for ads / gadgets and other
full page contents, for small snippets of JS-inserted HTML, and for
server-returned data.

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

I am not suggesting this at all; extending the spec to cover, or at
least hint these cases would be a good idea. This is not to say the
functionality as currently speced out should be scraped. My points
were:

1) If we want to keep it limited to the ads / gadget case, we should
make it clear in the spec, reconsider the applicability of
allow-same-origin in this context, and definitely revise the as of now
unsafe getusercontent example, etc. I am not entirely sold that this
is a beneficial strategy in the long run, but as long as the
alternatives were considered, so be it.

2) If we want to make the implementation useful for other scenarios as
well, and avoid the proliferation of HTML-sandboxing APIs with
different security controls, we should still keep the spec mostly as
is, and I have no objection to implementations incorporating it; BUT
it would be beneficial to allow it to be extended as outlined above,
or in a similar general way, specifically making it easy to sandbox
inline HTML, and to place thousands of sandboxed containers on a page
without making the renderer implode.

/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
On Sun, Dec 13, 2009 at 2:00 PM, Adam Barth wha...@adambarth.com wrote:

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

And Adam - as you know, I have deep respect for your expertise and
contributions in this area, so please do not take this personally...
but this really strikes me as throwing random ideas at the wall, and
seeing which ones stick.

This is sometimes beneficial - but we are dealing with possibly
revolutionary security mechanisms in the browser, meant to counter one
of the most significant unsolved security challenges on the web. And
this mode of  engineering is probably why we have a different
same-origin policies for XMLHttpRequest, DOM  access, cookies,
third-party cookie setting, Flash, Java, Silverlight... plus assorted
ideas such as MSIE zones on top of it. It's the reason why their sum
is less secure than each of the mechanisms individually.

Still, this is not an attempt to dismiss the hard work: implementing
sandboxed IFRAMEs as-is and calling it a day *will* make the Internet
a safer place. But a collection of walled off, incompatible APIs with
different security switches and knobs, all of  them to perform a
common task, does strike me as suboptimal - and I do think it's
avoidable. Especially since, I am guessing, some of the pragmatic
objections to guarded tags were probably due to implementation
complexity or dubious usability, all of which are probably moot with
@sandbox in place.

Furthermore, in this particular case, I am really concerned that the
spec is at odds with itself - you mention certain specific use cases,
but the spec seems to be after a broader goal: sandboxing
user-supplied content in general. In doing so, it gives some bad
advice (again, the user content example is exploitable, at least until
the arrival of some out-of-scope security mechanism to prevent it).

I think I stated the concerns reasonably well earlier in the thread;
but if they sound unwarranted or inflammatory, I can admit a defeat.

Cheers,
/mz


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Adam Barth
On Sun, Dec 13, 2009 at 2:13 PM, Michal Zalewski lcam...@coredump.cx wrote:
 How do I use the jail tag to sandbox advertisements?

 Huh? But that's not the point I am making... I am not arguing that
 iframe sandbox should be abandoned as a bad idea - quite the opposite.

 I was merely suggesting that we *expand* the same logic, and the same
 excellent security control granularity, to span and div; this seems
 like it would not increase the implementation complexity in any
 significant way.

Implementation complexity is not the gating factor.  Implementing
canvas is orders of magnitude more complex than any of the proposals
we've seen so far.  The gating factor is discovering simple, robust
mechanisms that provide security for the key use cases.

 We could then allow these to be populated with secure
 contents in three ways:

 1) Guarded closing tag - this is simple and bullet-proof; but may
 conflict with XML serializations, and hence require some hacks,

 2) CDATA or @doc-like approaches. Less secure because it does not
 enforce a security control, but less contentious, and already being
 considered for IFRAMEs.

 3) .innerHTML, which would be then safe by default, without the need
 for .innerSafeHTML (and the associated ambiguities) or explicit
 .toStaticHTML calls.

 This allows people to utilize the mechanism for so many more
 additional use cases without the performance and usability cost of
 IFRAMEs, and does not subvert the original ad / gadget use case in any
 way.

 *This* is what I find greatly preferred to having separate, completely
 disjointed APIs with different semantics for ads / gadgets and other
 full page contents, for small snippets of JS-inserted HTML, and for
 server-returned data.

It sounds like you think we should proceed with @sandbox and also do
something with inline HTML.  Ian has already asked browser vendors to
experiment in these areas and try to gain some implementation
experience.  I'd encourage you to write up your thoughts in a brief
spec along the lines of the DOM-based HTML Sanitizer document I sent
to this list a while back.

I'm very interested in a solution that works for the following use cases:

1) A web page wants to display untrusted (i.e., restricted) HTML
received via cross-site XMLHttpRequest or postMessage.

2) A blog wishes to display many comments containing untrusted (i.e.,
restricted) HTML.

I'm certainly not married to my proposal.  In fact, I'm planning to
update it based on the feedback I've received here an elsewhere.

 The sandbox tag is great at addressing that use case.  I don't see why
 we should delay it in the hopes that the jail tag comes back to
 life.

 I am not suggesting this at all; extending the spec to cover, or at
 least hint these cases would be a good idea. This is not to say the
 functionality as currently speced out should be scraped. My points
 were:

 1) If we want to keep it limited to the ads / gadget case, we should
 make it clear in the spec, reconsider the applicability of
 allow-same-origin in this context,

allow-same-origin is useful if the advertisement wishes to retrieve
additional information from its origin, e.g via XMLHttpRequest or the
video tag.  For example, the ad might want to show a video from its
origin and be able to interact with the video without the cross-origin
restrictions.

 and definitely revise the as of now unsafe getusercontent example, etc.

I agree what we should revise that example.

 I am not entirely sold that this
 is a beneficial strategy in the long run, but as long as the
 alternatives were considered, so be it.

 2) If we want to make the implementation useful for other scenarios as
 well, and avoid the proliferation of HTML-sandboxing APIs with
 different security controls, we should still keep the spec mostly as
 is, and I have no objection to implementations incorporating it; BUT
 it would be beneficial to allow it to be extended as outlined above,
 or in a similar general way, specifically making it easy to sandbox
 inline HTML, and to place thousands of sandboxed containers on a page
 without making the renderer implode.

These concerns seem to be with the implementation and not the spec.
We certainly can expand the web platform after HTML5.  I must be
misunderstanding something.

On Sun, Dec 13, 2009 at 2:31 PM, Michal Zalewski lcam...@coredump.cx wrote:
 And Adam - as you know, I have deep respect for your expertise and
 contributions in this area, so please do not take this personally...
 but this really strikes me as throwing random ideas at the wall, and
 seeing which ones stick.

 This is sometimes beneficial - but we are dealing with possibly
 revolutionary security mechanisms in the browser, meant to counter one
 of the most significant unsolved security challenges on the web.

I'm not sure its that revolutionary, but I'm glad you think it's important work.

 And this mode of  engineering is probably why we have a different
 same-origin policies for XMLHttpRequest, DOM  

Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Aryeh Gregor
On Fri, Dec 11, 2009 at 11:18 PM, Michal Zalewski lcam...@coredump.cx wrote:
 The ability to sandbox SPANs or DIVs using a token-guarded approach
 (span sandbox=random_token/span sandbox=same_token) is, on the
 other hand, considerably easier on the developer, and probably has a
 very similar implementation complexity.

Well, the problem this random token thing is trying to address is that
the untrusted content could just close the tag.  (I fondly remember my
days on Geocities, when we would add noscriptnoscript to the end
of our pages to try to get rid of the auto-injected ads.)  But it's
kind of hacky and might be prone to failure, and the syntax is really
unpleasant (especially for XML compatibility).

So instead, why not just use the standard escaping mechanisms we
already have?  Allow a sandbox attribute on all elements that can
contain phrasing or flow content.  Any such element with a sandbox
attribute will be required to contain no literal ' before the
closing tag.  If any of those four characters is encountered, the
element is treated as having no contents.  Otherwise, the browser
unescapes all characters with special meanings (lt; - , gt;
- , amp; - , etc.) and then treats the resulting string as
the inner HTML of the element, parsing it like regular HTML, but the
contents are sandboxed.

Examples:

span sandboxThis span will work normally, except for being sandboxed./span

span sandboxThis span will be emempty/em in the DOM, even though
it contains no evil content, because otherwise authors will forget to
escape the contents of the sandbox./span

span sandboxlt;spangt;But this span will have another span as its
child, sandboxed.  The regular parser sees no entities here, only a
nested span!lt;/spangt;/span

span sandboxIt would be safe to allow this to work, since it only
contains an apostrophe, but let's not, so that lack of escaping is
easier to catch.  This span is therefore also empty./span


I think this is easier to use than having to generate a random token,
and also more secure.  If your code isn't escaping things right,
you'll quickly notice when your blog comments all vanish.

This is even backward-compatible, in a certain sense.  jail would be
unsafe to serve with untrusted contents until all UAs reliably support
it.  This would be perfectly safe in all browsers, it would just
display poorly in old browsers if there's any HTML markup in the
content.

What do people think of this syntax?


Re: [whatwg] some thoughts on sandboxed IFRAMEs

2009-12-13 Thread Michal Zalewski
 span sandboxlt;spangt;But this span will have another span as its
 child, sandboxed.  The regular parser sees no entities here, only a
 nested span!lt;/spangt;/span

That's a pretty reasonable variant for lightweight sandboxes, IMO. It
does not have the explicit assurance of a token-based approach (i.e.,
will not fail right away if the user gets it wrong), but it's better
than data: URLs or @doc in that - as you noted - it will fail quickly
if the encapsulated HTML is not escaped, while this may still go
unnoticed until abused:

iframe sandbox doc=h1User input without escaping/iframe
iframe sandbox src=data:text/html,h1User input without escaping/iframe

As a side note, the other benefit of sandboxed spans and divs in such
a design is that you can then have .innerHTML on sandbox-tagged
elements automagically conform to the sandboxing rules, without the
need for .toStaticHTML, .secureInnerHTML, or similar approaches (which
are error-prone by the virtue of tying sanitization to data access
method, rather than a particular element).

/mz


[whatwg] some thoughts on sandboxed IFRAMEs

2009-12-11 Thread Michal Zalewski
Hi folks,

So, we were having some internal discussions about the IFRAME sandbox
attribute; Adam Barth suggested it would be more productive to bring
some of the points I was making on the mailing list instead.

I think the attribute is an excellent idea, and close to the dream
design we talked about internally for a while. I do have some
peripheral concerns, though, and seems like now is the time to bring
them up!

Starting with two high-level comments: although I understand the
simplicity, and hence the appeal, of sandboxed IFRAMEs, I do fear that
they will be very hard on web developers - and hence of limited
utility. In particular:

1) IFRAME semantics make it exceedingly cumbersome to sandbox short
snippets of text, and this task is perhaps the most common and
pressing XSS-related challenge. Unless the document is constructed on
client side by JavaScript, sites would need to use opaque data: URLs,
or put up with a lot of additional HTTP roundtrips, to utilize
sandboxed IFRAMEs for this purpose. [ There is also the problem of
formatting and positioning IFRAME content, although the seamless
attribute would fix this. ]

The ability to sandbox SPANs or DIVs using a token-guarded approach
(span sandbox=random_token/span sandbox=same_token) is, on the
other hand, considerably easier on the developer, and probably has a
very similar implementation complexity.

2) Renderers suck dealing with IFRAMEs, and will probably continue to
do so for time being. This means that a typical, moderately complex
application (say, as a discussion forum or a social site), where
hundreds of user-controlled strings may need to be present to display
user content - the mechanism would have an unacceptable load time and
memory footprint. In fact, people are already coming up with
lightweight alternatives with a significant functionality overlap (and
different security controls). Microsoft has toStaticHTML(), while a
standardized implementation is being discussed here right now in a
separate thread.

Isn't the benefit of keeping the design slightly simpler (and
realistically, limited to relatively few usage scenarios) negated by
the fact that alternative solutions to other narrow problems would
need to emerge elsewhere? The browser coming with several different
script sanitizers with completely different APIs and security controls
does not strike me as a desirable outcome (all the flavors of SOP are
a testament to this). If the anser is not a strong no, maybe the
token-guarded DIV / SPAN approach is a better alternative?

Now, that aside - on a more pragmatic level, I have two extra comments:

1) The utility of the SOP sandboxing behavior outlined in the spec is
diminished if we have no way to actually *enforce* that the IFRAMEd
resource would only be rendered in such a context. If I am serving
user-supplied, unsanitized HTML, it is obviously safe to do iframe
sandbox src=show.cgi?id=1234/iframe - but where do we prevent the
attacker from calling http://my_site/show.cgi?id=1234 directly, and
bypassing the filter? There are two cases where the mechanism still
offers some protection:

1.1) If I make IFRAMEd URLs unpredictable with the use of security
tokens - but if people were likely to get this right, we wouldn't have
XSRF and related issues on the web,

1.2) f I point the IFRAME to a non-same-origin domain - but if I can
do this, and work out the non-trivial authentication challenges in
such a case, I largely don't need a SOP sandbox to begin with: I can
just use unique_id.sandboxdomain.com. In fact, many sites I know of
do this right now.

It strikes me that this mechanism would make a whole lot more sense if
supported on HTTP header level, instead: X-SOP-Sandbox: 1; in its
current shape, it is defensible perhaps if aided by Mozilla's CSP.
Otherwise, it's an error-prone detail, and we should at the very least
outline why it's very difficult to get it right in the spec.

2) The utility of the no form submission mode is limited to certain
very specific anti-phishing uses. While this does not invalidate it,
it makes it tempting to mention two other modes we discussed
internally, and that probably fall into the same bucket:

2.1) The ability to disable loading of external resources (images,
scripts, etc) in the sandboxed document. The common usage scenario is
when you do not want the displayed document to phone home for
privacy reasons, for example in a web mail system.

2.2) The ability to disable HTML parsing. On IFRAMEs, this can
actually be approximated with the excommunicated plaintext tag, or
with Content-Type: text/plain / data:text/plain,. On token-guarded
SPANs or DIVs, however, it would be pretty damn useful for displaying
text content without the need to escape , , , etc. Pure security
benefit is limited, but as a phishing prevention and display
correctness measure, it makes sense.

Well, that's it. Hope this does not come off as a complete rant :P

Cheers,
/mz