Re: [Standards] Security issues with XHTML-IM (again)

Dave Cridland Thu, 12 Oct 2017 09:09:12 -0700

On 12 October 2017 at 16:32, Jonas Wielicki <[email protected]> wrote:
> On Donnerstag, 12. Oktober 2017 15:58:02 CEST Dave Cridland wrote:
>> On 12 October 2017 at 15:19, Sam Whited <[email protected]> wrote:
>> > On Thu, Oct 12, 2017, at 03:09, Dave Cridland wrote:
>> >> I would note that in principle, a content security policy ought to
>> >> prevent such attacks outright.
>> >>
>> >> But there would, probably, remain several other innovative attacks,
>> >> such as passing client-specific markup intended to duplicate existing
>> >> UI elements.
>> >
>> > Indeed. Using a restricted subset of a complicated system always
>> > introduces the risk that some part of that complexity will not be
>> > understood and will leak out, possibly causing security issues. We see
>> > that on the web fairly regularly.
>> >
>> > It's my beleif that it's always better to use a simple, complete system
>> > instead of a restricted, complex system. We see the same thing with
>> > XMPP's use of XML: we may use a sane subset of it, but since the
>> > underlying libraries still handle things like proc insts and whatever
>> > the ampersand escape thing is called you still get attacks based on
>> > those every so often (even though they're forbidden in XMPP).
>> >
>> > I didn't bring this up in the original mail because it tends to get a
>> > bit abstract, but it's worth discussing if we move to make a
>> > replacement.
>>
>> I think the problem isn't simply a subset of a complex system, it's
>> that sanitizing HTML is a difficult and largely error prone problem
>> which has repeatedly been the cause of a number of security problems.
>>
>> I appreciate it's entirely possible, but even a simplified ruleset is
>> something like:
>>
>> 1) For each child element:
>> a) Discard if this is an unsupported element.
>> b) Remove any unsupported attributes.
>> c) For the style attribute, parse the CSS and:
>>     ii)  remove any unsupported attributes.
>>     i) For attributes which (might) contain a URL, ensure the URL is
>> of a scheme we think might be OK, although we won't tell you which
>> those are.
>> d) For each remaining HTML attribute which (might) contain a URL,
>> ensure that any URL is of a scheme we think be be OK, although we
>> won't tell you which those are.
>> e) Recurse for each child element.
>
> Note that all except (c) and (d) are trivial to implement in XSL. We should
> burn @style with fire.
>


So I need an XSLT engine now.

> For URLs (only in @href and @src), I’d suggest to update the security
> considerations again, to only allow the schemas http and https, but also allow
> extension by additional XEPs. HTTP and HTTPS will be sufficient for 99% of the
> use-cases. We have other XEPs which define e.g. cid:, which is also fine.
>

http and https are privacy leaks.

You forgot xmpp:, and data:, as well.

Your move.

> In this light, the fact that the sanitisation is difficult is mainly due to
> the subset we chose. If we chose to exclude @style and pose clear restrictions
> on img@src and a@href, the sanitisation becomes by orders of magnitude easier.
>
> (By the way: the javascript:-url-in-@style example from before doesn’t
> actually work. There have been features which could (in-)directly execute
> javascript from within CSS, but they have been disabled since 2016 (Firefox)
> and IE10 (see [1]). I’m also not sure that they would work cross-domain in any
> case, but yeah, let’s get rid of @style.)
>

Pretty sure there are some still around, if only because CSP exists to
prevent them from hurting too badly.

>
>> >> So overall, I think we should move rich IM formatting to Markdown and
>> >> call it done.
>> >
>> > Let's discuss this in a separate thread. I'd really like to try and keep
>> > this about deprecating XHTML-IM, which I think is an orthogonal track of
>> > work (unless you disagree, in which case, please voice that here!).
>>
>> It's clearly not orthogonal, since simply getting rid of XHTML-IM is
>> not deprecating it in favour of anything else.
>>
>> But several clients have supported a basic Markdown-like syntax for
>> emphasis for years - Gajim, for example, supports both *bold* and
>> /italic/ at a quick test, and I think it has for years.
>
> I think you’re mixing the composing and the publishing phase, which Goffi
> helpfully separated. For composing, I’m absolutely with you that Markdown or
> something similar is a good "input method" which should optionally be
> supported by clients.
>

I am absolutely not doing that. Gajim interprets bold and italic
(possibly others) in the Markdown sense on plaintext bodies on both
input and output.

> For publishing (that is, for transporting it over the XML stream), we need
> some kind of structured out-of-band (i.e. non-plaintext) markup. The plaintext
> markups are prone to behave weird if the meta-characters occur in the input
> (try to make e.g. "Trainer*Innen" emphasized with markdown. I bet that the
> results will differ by flavour and implementation.).
>
> I think we shouldn’t underestimate the interoperability we gain by having the
> markup out-of-band from the text (via XML elements, like XHTML-IM does). It
> allows us to add extensions in a way that clients which *send* old data are
> still correctly understood. Extensions to text-based markup languages often
> break plaintext which doesn’t know that extension and happens to look similar
> (this is the point somebody else made in this thread: There are no invalid
> markdown documents).
>
>
> Which is why I still think that a reference implementation of a sanitizer
> (especially since we have the rare case that the reference implementation will
> not need porting to different languages -> it can easily be updated if issues
> are found or extensions are made) *and* banning the @style attribute is the
> way to go here. @style has been a bad idea to begin with, for the reasons I
> already stated in this thread.
>
> The alternatives would be to specify our own XML-based markup (reinventing the
> wheel) or use some text-based markup. Both alternatives sound like a really
> bad solution to me.
>

I don't disagree, but implementation experience with XHTML-IM suggests
it, too, is a really bad solution - and often a dangerous one.

>
>> I appreciate Goffi's argument that Markdown-like syntaxes do not
>> handle tables, but guess what? Nor does XHTML-IM.
>
> Agreed, but it is trivial to extend it to do that, in the case we find a
> compelling use-case for tables at some point.
>
>
>> So my argument for keeping it in this thread is really in order to
>> understand what features of XHTML-IM are desirable rather than to
>> fully specify a replacement
>
> By the way, I think XHTML-IM could use extending with <video/> and/or <audio/>
> support.

/me cries

>
>
> TL;DR: I strongly prefer revising XHTML-IM to a more sane subset of XHTML plus
> providing a reference implementation of a sanitizer in JavaScript over
> anything else.
>
>
> kind regards,
> Jonas
>
>    [1]: https://stackoverflow.com/questions/476276/using-javascript-in-css
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: [email protected]
> _______________________________________________
>
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Re: [Standards] Security issues with XHTML-IM (again)

Reply via email to