Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-10-13 Thread Geoffrey Sneddon

Ian Hickson wrote:

On Mon, 7 Sep 2009, Aryeh Gregor wrote:

On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
 wrote:

Apparently Hixie had previously said he didn't want to change this as it
will become a non-issue over time. I think it does matter due to the
security issues it presents in existing UAs. Conforming markup (using
elements/attributes allowed in HTML 4.01) should not cause JS to execute in
one browser but not in another.
I agree with you as an author.  I wrote an HTML output function in 
MediaWiki assuming that what the standard says is known to be 
interoperable, which is apparently wrong.  If I hadn't been keeping up 
with HTML 5, I would have introduced an XSS vulnerability because of 
some browsers' handling of `.


If the problem will go away with time, then perhaps a later version of 
the standard could make such unquoted attributes conforming, once 
there's no more problem with them.


As far as I can tell, this is an IE bug; treating "`" as an attribute 
quoting character is non-conforming in any version of HTML so far, it 
seems. I'm certainly not going to make it non-conforming to stumble into 
any IE bug or difference in parsing between IE and previous specs or other 
browsers; we'd just end up with an asanine set of conformance 
requirements.


I agree that it's pointless to make it non-conforming to hit any parsing 
bug, but I would argue that we should make as many cases as it is 
sensible to do so non-conforming if they open up security holes in 
websites on legacy UAs, given that website uses a HTML 5 
parser/sanitizer/serializer.



For example, should this be non-conforming?

   
   Test
   
Search: 

   

This perfectly innocent piece of HTML content (HTML2-compliant except for 
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
non-conforming?


No, it opens up no security hole if that is done.

Similarly, IE conditional comments make it trivial to trigger scripts in 
IE but not another UA; indeed people do this on purpose. Should we make 
those non-conforming also?


They are a harder issue, but I think it is probably fair enough to 
assume that most sanitizers drop comments for such reasons, hence making 
them fine to leave as conforming also.


As I understand it, the attack here is a site that allows the user to 
input text that is used verbatim in two attributes, such that the user can 
set the first attribute's value to:


   `

...and the second to:

   ` onload='...payload...' end=x

...with the assumption that the site is going to not quote the first one, 
and quote the second one with double quotes:


(This is the default behaviour of Python html5lib, FWIW: the first is 
not quoted as it does not contain any whitespace characters or U+003E 
(>), the latter is quoted for that reason.)



   

...which in IE, for some reason, gets treated as:

   


Indeed, this is the attack I (and others) am concerned about.

I've disallowed ` in unquoted attribute values for now, but I think we 
should revert this once IE has fixed this bug for a few years.


Right, once versions of IE with this bug have faded out of existence I 
think this will become a non-issue. I also expect that'll be a while 
yet, though, and I highly doubt that time will have come even by the 
time when HTML 5 goes to REC. Furthermore, if there are similar attacks 
to this, I think they should similarly be made non-conforming.


--
Geoffrey Sneddon — Opera Software




Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-10-04 Thread Ian Hickson
On Mon, 7 Sep 2009, Aryeh Gregor wrote:
> On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
>  wrote:
> > Apparently Hixie had previously said he didn't want to change this as it
> > will become a non-issue over time. I think it does matter due to the
> > security issues it presents in existing UAs. Conforming markup (using
> > elements/attributes allowed in HTML 4.01) should not cause JS to execute in
> > one browser but not in another.
> 
> I agree with you as an author.  I wrote an HTML output function in 
> MediaWiki assuming that what the standard says is known to be 
> interoperable, which is apparently wrong.  If I hadn't been keeping up 
> with HTML 5, I would have introduced an XSS vulnerability because of 
> some browsers' handling of `.
> 
> If the problem will go away with time, then perhaps a later version of 
> the standard could make such unquoted attributes conforming, once 
> there's no more problem with them.

As far as I can tell, this is an IE bug; treating "`" as an attribute 
quoting character is non-conforming in any version of HTML so far, it 
seems. I'm certainly not going to make it non-conforming to stumble into 
any IE bug or difference in parsing between IE and previous specs or other 
browsers; we'd just end up with an asanine set of conformance 
requirements. For example, should this be non-conforming?

   
   Test
   
Search: 

   

This perfectly innocent piece of HTML content (HTML2-compliant except for 
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
non-conforming?

Similarly, IE conditional comments make it trivial to trigger scripts in 
IE but not another UA; indeed people do this on purpose. Should we make 
those non-conforming also?


As I understand it, the attack here is a site that allows the user to 
input text that is used verbatim in two attributes, such that the user can 
set the first attribute's value to:

   `

...and the second to:

   ` onload='...payload...' end=x

...with the assumption that the site is going to not quote the first one, 
and quote the second one with double quotes:

   

...which in IE, for some reason, gets treated as:

   


I've disallowed ` in unquoted attribute values for now, but I think we 
should revert this once IE has fixed this bug for a few years.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-14 Thread Ian Hickson
On Sun, 6 Sep 2009, Aryeh Gregor wrote:
>
> See some research here:
> 
> http://code.google.com/p/html5lib/issues/detail?id=93
> 
> It seems like in addition to whitespace and "'=<> , the characters 
> U+ through U+0020 should be banned from unquoted attribute values, 
> as well as U+0060 (backtick `), for the sake of compatibility.

On Mon, 7 Sep 2009, Geoffrey Sneddon wrote:
> 
> Apparently Hixie had previously said he didn't want to change this as it 
> will become a non-issue over time. I think it does matter due to the 
> security issues it presents in existing UAs. Conforming markup (using 
> elements/attributes allowed in HTML 4.01) should not cause JS to execute 
> in one browser but not in another.

The right fix here is to have the browsers all implement the same parser 
algorithm.

Validators are welcome to warn about this case, though.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-07 Thread Aryeh Gregor
On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
 wrote:
> Apparently Hixie had previously said he didn't want to change this as it
> will become a non-issue over time. I think it does matter due to the
> security issues it presents in existing UAs. Conforming markup (using
> elements/attributes allowed in HTML 4.01) should not cause JS to execute in
> one browser but not in another.

I agree with you as an author.  I wrote an HTML output function in
MediaWiki assuming that what the standard says is known to be
interoperable, which is apparently wrong.  If I hadn't been keeping up
with HTML 5, I would have introduced an XSS vulnerability because of
some browsers' handling of `.

If the problem will go away with time, then perhaps a later version of
the standard could make such unquoted attributes conforming, once
there's no more problem with them.


Re: [whatwg] More prohibited characters for unquoted attributes are needed

2009-09-07 Thread Geoffrey Sneddon


On 6 Sep 2009, at 12:35, Aryeh Gregor wrote:


See some research here:

http://code.google.com/p/html5lib/issues/detail?id=93

It seems like in addition to whitespace and "'=<> , the characters
U+ through U+0020 should be banned from unquoted attribute values,
as well as U+0060 (backtick `), for the sake of compatibility.


Apparently Hixie had previously said he didn't want to change this as  
it will become a non-issue over time. I think it does matter due to  
the security issues it presents in existing UAs. Conforming markup  
(using elements/attributes allowed in HTML 4.01) should not cause JS  
to execute in one browser but not in another.



--
Geoffrey Sneddon




[whatwg] More prohibited characters for unquoted attributes are needed

2009-09-06 Thread Aryeh Gregor
See some research here:

http://code.google.com/p/html5lib/issues/detail?id=93

It seems like in addition to whitespace and "'=<> , the characters
U+ through U+0020 should be banned from unquoted attribute values,
as well as U+0060 (backtick `), for the sake of compatibility.