Ian Hickson wrote:
On Mon, 7 Sep 2009, Aryeh Gregor wrote:
On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
<foolistbar at googlemail.com> wrote:
Apparently Hixie had previously said he didn't want to change this as it
will become a non-issue over time. I think it does matter due to the
security issues it presents in existing UAs. Conforming markup (using
elements/attributes allowed in HTML 4.01) should not cause JS to execute in
one browser but not in another.
I agree with you as an author. I wrote an HTML output function in
MediaWiki assuming that what the standard says is known to be
interoperable, which is apparently wrong. If I hadn't been keeping up
with HTML 5, I would have introduced an XSS vulnerability because of
some browsers' handling of `.
If the problem will go away with time, then perhaps a later version of
the standard could make such unquoted attributes conforming, once
there's no more problem with them.
As far as I can tell, this is an IE bug; treating "`" as an attribute
quoting character is non-conforming in any version of HTML so far, it
seems. I'm certainly not going to make it non-conforming to stumble into
any IE bug or difference in parsing between IE and previous specs or other
browsers; we'd just end up with an asanine set of conformance
requirements.
I agree that it's pointless to make it non-conforming to hit any parsing
bug, but I would argue that we should make as many cases as it is
sensible to do so non-conforming if they open up security holes in
websites on legacy UAs, given that website uses a HTML 5
parser/sanitizer/serializer.
For example, should this be non-conforming?
<!DOCTYPE html>
<title>Test</title>
<form>
<label>Search: <input type=text></label>
<input type=submit>
</form>
This perfectly innocent piece of HTML content (HTML2-compliant except for
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it
non-conforming?
No, it opens up no security hole if that is done.
Similarly, IE conditional comments make it trivial to trigger scripts in
IE but not another UA; indeed people do this on purpose. Should we make
those non-conforming also?
They are a harder issue, but I think it is probably fair enough to
assume that most sanitizers drop comments for such reasons, hence making
them fine to leave as conforming also.
As I understand it, the attack here is a site that allows the user to
input text that is used verbatim in two attributes, such that the user can
set the first attribute's value to:
`
...and the second to:
` onload='...payload...' end=x
...with the assumption that the site is going to not quote the first one,
and quote the second one with double quotes:
(This is the default behaviour of Python html5lib, FWIW: the first is
not quoted as it does not contain any whitespace characters or U+003E
(>), the latter is quoted for that reason.)
<body title=` class="` onload='...payload...' end=x">
...which in IE, for some reason, gets treated as:
<body title=' class="'
onload='...payload...'
end='x"'>
Indeed, this is the attack I (and others) am concerned about.
I've disallowed ` in unquoted attribute values for now, but I think we
should revert this once IE has fixed this bug for a few years.
Right, once versions of IE with this bug have faded out of existence I
think this will become a non-issue. I also expect that'll be a while
yet, though, and I highly doubt that time will have come even by the
time when HTML 5 goes to REC. Furthermore, if there are similar attacks
to this, I think they should similarly be made non-conforming.
--
Geoffrey Sneddon — Opera Software
<http://gsnedders.com/>
<http://www.opera.com/>