Re: [Standards] Security issues with XHTML-IM (again)

Jonas Wielicki Wed, 11 Oct 2017 23:28:30 -0700

Hi Sam,

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> I recently tried out another service that supported XEP-0071: XHTML-IM
> [1]. Like all other web-based services with XHTML-IM support that I've
> tried, it was vulnerable to a trivial script injection. When I say
> "all", I really do mean "all" (though I'm not sure how many; it's more
> than a few at this point). […]

Ouch.

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> In one sense, the spec is not to blame for this. It does things
> correctly, utilizing a white list of attributes and elements, and if you
> follow it you are likely to be safe from injection. 

I glanced at the spec, but in fact, I think it is to blame here to some 
extent. The security considerations do not mention that an implementation has 
to safeguard against an attacker injecting malicious XHTML. Instead, there is:

XEP-0071 (XHTML-IM) 1.5.1. §7.8 Summary of Recommendations states:
> Any other elements and attributes defined in the XHTML 1.0 modules that are 
> included in the XHTML-IM Integration Set SHOULD NOT be generated by a 
> compliant implementation, and SHOULD be ignored if received (where the 
> meaning of "ignore" is defined by the conformance requirements of 
> Modularization of XHTML, as summarized in the User Agent Conformance section 
> of this document).

XEP-0071 (XHTML-IM) 1.5.1. §11 Security Considerations states:
> The exclusion of scripts, applets, binary objects, and other potentially 
> executable code from XHTML-IM reduces the risk of exposure to harmful or 
> malicious objects caused by inclusion of XHTML content. To further reduce 
> the risk of such exposure, an implementation MAY choose to:
> 
> * Not make hyperlinks clickable
> * Not fetch or present images but instead show only the 'alt' text.
> 
> In addition, an implementation MUST make it possible for a user to prevent 
> the automatic fetching and presentation of images (rather than leave it up 
> to the implementation).

It is only a "SHOULD NOT", and the security considerations do not mention why 
it would be bad to allow more. I think this should be spelt out much more 
clearly. I prepared a pull request to clarify the wording [1].

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> In the real world
> though, this is not enough. 

I agree, even if the spec indeed was perfect.

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> Using HTML ina browser, even when done
> right, ties you to an environment that can execute scripts meaning that
> small bugs have a higher potential to end up causing a security issue

Well, yes, but that’s the issue with doing webapps, right? There are lots of 
points where things can go wrong. I’m just going to say foo.innerHTML = 
message_body; and let you shiver in fear :-). But see below for a more 
differentiated idea.

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> Furthermore, the "path of least resistance"
> for developers is just to dump HTML into the DOM, which I have observed
> on a number of web clients, including the one I ran into recently that
> prompted me to write this.
> 
> I'd like to suggest (again) that we obsolete XHTML-IM. If the easy way
> to implement a spec is insecure, you can be sure users will do that. We
> can't guarantee security in a spec, but we can certainly make something
> that's harder than XHTML-IM to implement incorrectly, which would be a
> huge gain.

We should first discuss what an alternative would look like (I read the 
council meeting backlog and I see that this is a security issue, but please 
bear with me!). There are legitimate use-cases for markup, at least quotes are 
rather common in our circles, I feel. Other markup may be more common in other 
circles, I don’t know that (I block inbound markup for readability reasons). 
And we need to make sure that we don’t trade one vulnerability for another.

There are two large categories of alternatives which are possible (and I’m 
exploring those so that we don’t make a stupid mistake here which we would 
have to roll back later):

1. invent our own custom XML markup:

   upsides: we can make it well-defined and include exactly the features
   which seem sensible. extension is easily possible.

   downsides: attackers can still try to embed HTML into that markup, again
   a possible vulnerability (I can see people simply rewriting element names
   and attaching the tree to an HTML-DOM without looking closely, just like
   they do now for XHTML-IM).

2. use e.g. markdown (in a separate <message/> child or directly in <body/>)
   or any other "text-based" markup.

   upsides: often looks reasonable even in text-based clients, saves markup
   to plaintext conversion.

   downsides: markdown specifically is not very well defined: there are 
   numerous flavours. many of which allow embedding of HTML by default! (this 
   also holds for many other markups, like reStructuredText) Not made to be
   extensible (aside from embedding HTML).

   somewhat-either-way-sides: does not allow changing of fonts and colors. The 
   ability to do that (and some weird clients setting the local GUI font) is 
   the main reason I often block inbound XHTML-IM altogether.

On Mittwoch, 11. Oktober 2017 15:42:45 CEST Sam Whited wrote:
> There is an argument to be made that we should create a replacement
> first, but since this is a security issue and not just a sub-optimal
> spec, I beleive that it would be irresponsible for the XSF to continue
> to recommend this spec given the number of bad implementations we've
> seen over the years and we should obsolete as quickly as possible.

I see an alternative path here (if we want to keep XHTML-IM or something 
similar). I think we can categorize the potentially vulnerable software in two 
classes:

1. pure JavaScript clients running in browsers: those are obviously the most
   vulnerable, because remote code execution in the browser == game over for 
   those.

   For these, the XSF or somebody close could provide a sensible reference 
   implementation in JavaScript which sanitizes the input properly (we could 
   even try to get it audited.) and release it under a liberal license (public 
   domain, MIT, or something similar; make sure to waive warranty ;-)). Then 
   modify the XEP to make the security issues very clear and link to the 
   JavaScript reference implementation. Make a blog post, raise awareness. 
   This reduces the "resistance" of the "proper" path significantly. And it 
   also makes it easier to fix those bugs in existing implementations: "Hey, 
   there’s our reference implementation: just hand it your DOM tree and it’ll 
   patch up everything to be secure.".

   I feel that if people are smart/informed enough to not use .innerHTML for 
   setting the plain text body to a <div/> or something, they’ll gladly use a 
   reference implementation sanitizing the input. If they are not, then there 
   will be enough XSS independent on whether XHTML-IM is used or not.

2. "Normal" clients using a browser-based view for message display, for 
   whatever reasons (I think there are compelling reasons to do so, I’m going
   to do that myself).

   Those can either re-use the JavaScript reference implementation, or the XSF
   could provide a reference XSL transformation which applies the rules of 
   XHTML-IM. I wrote a draft for that [1] which I think would do the right 
   thing. I am convinced that it is possible to apply the XHTML-IM 
   structural specification via XSLT (this notably excludes the value of 
   @style and @href attributes, see below).

(There is the third category you mentioned, e.g. using GTKs or Qts text views 
which support a subset of HTML. Those are safe-ish when they brutally embed 
XHTML-IM.)

Neither of these solutions does address handling of the @style attribute. I am 
not sure how easy it is to filter that in JavaScript, I haven’t looked into 
it. The potential attacks via the style should be less grave than the attacks 
possible by injecting JavaScript though.

   [1]: https://github.com/xsf/xeps/pull/528
   [2]: https://github.com/horazont/aioxmpp/commit/
a9999b1cce3a089792768b63ec738fc3d2e8ca47

        This lacks a white-list for URI schemes. Such a white-list would be 
        required to ensure that no javascript: etc. @href or so attributes
        can be used.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Re: [Standards] Security issues with XHTML-IM (again)

Reply via email to