On 12 October 2017 at 16:32, Jonas Wielicki <[email protected]> wrote: > On Donnerstag, 12. Oktober 2017 15:58:02 CEST Dave Cridland wrote: >> On 12 October 2017 at 15:19, Sam Whited <[email protected]> wrote: >> > On Thu, Oct 12, 2017, at 03:09, Dave Cridland wrote: >> >> I would note that in principle, a content security policy ought to >> >> prevent such attacks outright. >> >> >> >> But there would, probably, remain several other innovative attacks, >> >> such as passing client-specific markup intended to duplicate existing >> >> UI elements. >> > >> > Indeed. Using a restricted subset of a complicated system always >> > introduces the risk that some part of that complexity will not be >> > understood and will leak out, possibly causing security issues. We see >> > that on the web fairly regularly. >> > >> > It's my beleif that it's always better to use a simple, complete system >> > instead of a restricted, complex system. We see the same thing with >> > XMPP's use of XML: we may use a sane subset of it, but since the >> > underlying libraries still handle things like proc insts and whatever >> > the ampersand escape thing is called you still get attacks based on >> > those every so often (even though they're forbidden in XMPP). >> > >> > I didn't bring this up in the original mail because it tends to get a >> > bit abstract, but it's worth discussing if we move to make a >> > replacement. >> >> I think the problem isn't simply a subset of a complex system, it's >> that sanitizing HTML is a difficult and largely error prone problem >> which has repeatedly been the cause of a number of security problems. >> >> I appreciate it's entirely possible, but even a simplified ruleset is >> something like: >> >> 1) For each child element: >> a) Discard if this is an unsupported element. >> b) Remove any unsupported attributes. >> c) For the style attribute, parse the CSS and: >> ii) remove any unsupported attributes. >> i) For attributes which (might) contain a URL, ensure the URL is >> of a scheme we think might be OK, although we won't tell you which >> those are. >> d) For each remaining HTML attribute which (might) contain a URL, >> ensure that any URL is of a scheme we think be be OK, although we >> won't tell you which those are. >> e) Recurse for each child element. > > Note that all except (c) and (d) are trivial to implement in XSL. We should > burn @style with fire. >
So I need an XSLT engine now. > For URLs (only in @href and @src), I’d suggest to update the security > considerations again, to only allow the schemas http and https, but also allow > extension by additional XEPs. HTTP and HTTPS will be sufficient for 99% of the > use-cases. We have other XEPs which define e.g. cid:, which is also fine. > http and https are privacy leaks. You forgot xmpp:, and data:, as well. Your move. > In this light, the fact that the sanitisation is difficult is mainly due to > the subset we chose. If we chose to exclude @style and pose clear restrictions > on img@src and a@href, the sanitisation becomes by orders of magnitude easier. > > (By the way: the javascript:-url-in-@style example from before doesn’t > actually work. There have been features which could (in-)directly execute > javascript from within CSS, but they have been disabled since 2016 (Firefox) > and IE10 (see [1]). I’m also not sure that they would work cross-domain in any > case, but yeah, let’s get rid of @style.) > Pretty sure there are some still around, if only because CSP exists to prevent them from hurting too badly. > >> >> So overall, I think we should move rich IM formatting to Markdown and >> >> call it done. >> > >> > Let's discuss this in a separate thread. I'd really like to try and keep >> > this about deprecating XHTML-IM, which I think is an orthogonal track of >> > work (unless you disagree, in which case, please voice that here!). >> >> It's clearly not orthogonal, since simply getting rid of XHTML-IM is >> not deprecating it in favour of anything else. >> >> But several clients have supported a basic Markdown-like syntax for >> emphasis for years - Gajim, for example, supports both *bold* and >> /italic/ at a quick test, and I think it has for years. > > I think you’re mixing the composing and the publishing phase, which Goffi > helpfully separated. For composing, I’m absolutely with you that Markdown or > something similar is a good "input method" which should optionally be > supported by clients. > I am absolutely not doing that. Gajim interprets bold and italic (possibly others) in the Markdown sense on plaintext bodies on both input and output. > For publishing (that is, for transporting it over the XML stream), we need > some kind of structured out-of-band (i.e. non-plaintext) markup. The plaintext > markups are prone to behave weird if the meta-characters occur in the input > (try to make e.g. "Trainer*Innen" emphasized with markdown. I bet that the > results will differ by flavour and implementation.). > > I think we shouldn’t underestimate the interoperability we gain by having the > markup out-of-band from the text (via XML elements, like XHTML-IM does). It > allows us to add extensions in a way that clients which *send* old data are > still correctly understood. Extensions to text-based markup languages often > break plaintext which doesn’t know that extension and happens to look similar > (this is the point somebody else made in this thread: There are no invalid > markdown documents). > > > Which is why I still think that a reference implementation of a sanitizer > (especially since we have the rare case that the reference implementation will > not need porting to different languages -> it can easily be updated if issues > are found or extensions are made) *and* banning the @style attribute is the > way to go here. @style has been a bad idea to begin with, for the reasons I > already stated in this thread. > > The alternatives would be to specify our own XML-based markup (reinventing the > wheel) or use some text-based markup. Both alternatives sound like a really > bad solution to me. > I don't disagree, but implementation experience with XHTML-IM suggests it, too, is a really bad solution - and often a dangerous one. > >> I appreciate Goffi's argument that Markdown-like syntaxes do not >> handle tables, but guess what? Nor does XHTML-IM. > > Agreed, but it is trivial to extend it to do that, in the case we find a > compelling use-case for tables at some point. > > >> So my argument for keeping it in this thread is really in order to >> understand what features of XHTML-IM are desirable rather than to >> fully specify a replacement > > By the way, I think XHTML-IM could use extending with <video/> and/or <audio/> > support. /me cries > > > TL;DR: I strongly prefer revising XHTML-IM to a more sane subset of XHTML plus > providing a reference implementation of a sanitizer in JavaScript over > anything else. > > > kind regards, > Jonas > > [1]: https://stackoverflow.com/questions/476276/using-javascript-in-css > _______________________________________________ > Standards mailing list > Info: https://mail.jabber.org/mailman/listinfo/standards > Unsubscribe: [email protected] > _______________________________________________ > _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
