Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-10-13 Thread Geoffrey Sneddon

Ian Hickson wrote:

On Mon, 7 Sep 2009, Aryeh Gregor wrote:

On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
foolistbar at googlemail.com wrote:

Apparently Hixie had previously said he didn't want to change this as it
will become a non-issue over time. I think it does matter due to the
security issues it presents in existing UAs. Conforming markup (using
elements/attributes allowed in HTML 4.01) should not cause JS to execute in
one browser but not in another.
I agree with you as an author.  I wrote an HTML output function in 
MediaWiki assuming that what the standard says is known to be 
interoperable, which is apparently wrong.  If I hadn't been keeping up 
with HTML 5, I would have introduced an XSS vulnerability because of 
some browsers' handling of `.


If the problem will go away with time, then perhaps a later version of 
the standard could make such unquoted attributes conforming, once 
there's no more problem with them.


As far as I can tell, this is an IE bug; treating ` as an attribute 
quoting character is non-conforming in any version of HTML so far, it 
seems. I'm certainly not going to make it non-conforming to stumble into 
any IE bug or difference in parsing between IE and previous specs or other 
browsers; we'd just end up with an asanine set of conformance 
requirements.


I agree that it's pointless to make it non-conforming to hit any parsing 
bug, but I would argue that we should make as many cases as it is 
sensible to do so non-conforming if they open up security holes in 
websites on legacy UAs, given that website uses a HTML 5 
parser/sanitizer/serializer.



For example, should this be non-conforming?

   !DOCTYPE html
   titleTest/title
   form
labelSearch: input type=text/label
input type=submit
   /form

This perfectly innocent piece of HTML content (HTML2-compliant except for 
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
non-conforming?


No, it opens up no security hole if that is done.

Similarly, IE conditional comments make it trivial to trigger scripts in 
IE but not another UA; indeed people do this on purpose. Should we make 
those non-conforming also?


They are a harder issue, but I think it is probably fair enough to 
assume that most sanitizers drop comments for such reasons, hence making 
them fine to leave as conforming also.


As I understand it, the attack here is a site that allows the user to 
input text that is used verbatim in two attributes, such that the user can 
set the first attribute's value to:


   `

...and the second to:

   ` onload='...payload...' end=x

...with the assumption that the site is going to not quote the first one, 
and quote the second one with double quotes:


(This is the default behaviour of Python html5lib, FWIW: the first is 
not quoted as it does not contain any whitespace characters or U+003E 
(), the latter is quoted for that reason.)



   body title=` class=` onload='...payload...' end=x

...which in IE, for some reason, gets treated as:

   body title=' class='
 onload='...payload...'
 end='x'


Indeed, this is the attack I (and others) am concerned about.

I've disallowed ` in unquoted attribute values for now, but I think we 
should revert this once IE has fixed this bug for a few years.


Right, once versions of IE with this bug have faded out of existence I 
think this will become a non-issue. I also expect that'll be a while 
yet, though, and I highly doubt that time will have come even by the 
time when HTML 5 goes to REC. Furthermore, if there are similar attacks 
to this, I think they should similarly be made non-conforming.


--
Geoffrey Sneddon — Opera Software
http://gsnedders.com/
http://www.opera.com/


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-10-04 Thread Ian Hickson
On Mon, 7 Sep 2009, Aryeh Gregor wrote:
 On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
 foolist...@googlemail.com wrote:
  Apparently Hixie had previously said he didn't want to change this as it
  will become a non-issue over time. I think it does matter due to the
  security issues it presents in existing UAs. Conforming markup (using
  elements/attributes allowed in HTML 4.01) should not cause JS to execute in
  one browser but not in another.
 
 I agree with you as an author.  I wrote an HTML output function in 
 MediaWiki assuming that what the standard says is known to be 
 interoperable, which is apparently wrong.  If I hadn't been keeping up 
 with HTML 5, I would have introduced an XSS vulnerability because of 
 some browsers' handling of `.
 
 If the problem will go away with time, then perhaps a later version of 
 the standard could make such unquoted attributes conforming, once 
 there's no more problem with them.

As far as I can tell, this is an IE bug; treating ` as an attribute 
quoting character is non-conforming in any version of HTML so far, it 
seems. I'm certainly not going to make it non-conforming to stumble into 
any IE bug or difference in parsing between IE and previous specs or other 
browsers; we'd just end up with an asanine set of conformance 
requirements. For example, should this be non-conforming?

   !DOCTYPE html
   titleTest/title
   form
labelSearch: input type=text/label
input type=submit
   /form

This perfectly innocent piece of HTML content (HTML2-compliant except for 
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
non-conforming?

Similarly, IE conditional comments make it trivial to trigger scripts in 
IE but not another UA; indeed people do this on purpose. Should we make 
those non-conforming also?


As I understand it, the attack here is a site that allows the user to 
input text that is used verbatim in two attributes, such that the user can 
set the first attribute's value to:

   `

...and the second to:

   ` onload='...payload...' end=x

...with the assumption that the site is going to not quote the first one, 
and quote the second one with double quotes:

   body title=` class=` onload='...payload...' end=x

...which in IE, for some reason, gets treated as:

   body title=' class='
 onload='...payload...'
 end='x'


I've disallowed ` in unquoted attribute values for now, but I think we 
should revert this once IE has fixed this bug for a few years.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-14 Thread Ian Hickson
On Sun, 6 Sep 2009, Aryeh Gregor wrote:

 See some research here:
 
 http://code.google.com/p/html5lib/issues/detail?id=93
 
 It seems like in addition to whitespace and '= , the characters 
 U+ through U+0020 should be banned from unquoted attribute values, 
 as well as U+0060 (backtick `), for the sake of compatibility.

On Mon, 7 Sep 2009, Geoffrey Sneddon wrote:
 
 Apparently Hixie had previously said he didn't want to change this as it 
 will become a non-issue over time. I think it does matter due to the 
 security issues it presents in existing UAs. Conforming markup (using 
 elements/attributes allowed in HTML 4.01) should not cause JS to execute 
 in one browser but not in another.

The right fix here is to have the browsers all implement the same parser 
algorithm.

Validators are welcome to warn about this case, though.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-07 Thread Aryeh Gregor
On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
foolist...@googlemail.com wrote:
 Apparently Hixie had previously said he didn't want to change this as it
 will become a non-issue over time. I think it does matter due to the
 security issues it presents in existing UAs. Conforming markup (using
 elements/attributes allowed in HTML 4.01) should not cause JS to execute in
 one browser but not in another.

I agree with you as an author.  I wrote an HTML output function in
MediaWiki assuming that what the standard says is known to be
interoperable, which is apparently wrong.  If I hadn't been keeping up
with HTML 5, I would have introduced an XSS vulnerability because of
some browsers' handling of `.

If the problem will go away with time, then perhaps a later version of
the standard could make such unquoted attributes conforming, once
there's no more problem with them.


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-07 Thread Geoffrey Sneddon


On 6 Sep 2009, at 12:35, Aryeh Gregor wrote:


See some research here:

http://code.google.com/p/html5lib/issues/detail?id=93

It seems like in addition to whitespace and '= , the characters
U+ through U+0020 should be banned from unquoted attribute values,
as well as U+0060 (backtick `), for the sake of compatibility.


Apparently Hixie had previously said he didn't want to change this as  
it will become a non-issue over time. I think it does matter due to  
the security issues it presents in existing UAs. Conforming markup  
(using elements/attributes allowed in HTML 4.01) should not cause JS  
to execute in one browser but not in another.



--
Geoffrey Sneddon
http://gsnedders.com/