Dear list, TL;DR: Four parts: (a) XHTML-IM is harmful and needs replacement, not fixing (so I changed my mind), (b) I think obsoleting without alternative is harmful, (c) Don’t put things in <body/>, (d) Thoughts about an alternative.
PART A
Okay, there has been some discussion in xsf@ yesterday which changed my mind a
little. The key point which convinced me was that Dave brought up the concept
of protocol breaks, and implied that a protocol break [1] is the only way to
prevent this kind of injection attacks [*].
Now this makes a lot of sense, and I can see that this trumps the elegance of
leveraging that we can embed XHTML semantics into the XML stream directly. So
I’m now on the position that XHTML-IM is harmful (I’ve been there before,
which is why I proposed fixes) *and* that we indeed might want to move to a
different type of markup as intermediate representation of the protocol break.
PART B
I am still not keen on obsoleting XHTML-IM before we have an actual
alternative ready. I don’t think that this will achieve anything good.
Instead, I think that one of two things will happen:
(a) Clients continue to implement XHTML-IM because it is the only actual
way to convey markup right now (this is what I’ll do until there’s a
replacement).
(b) The ecosystem will fracture in islands of different, underspecified,
plain-text markups put in <body/>.
I don’t think either is particularly good. I also wonder what it would look
like to have the only markup protocol with actual deployment being obsoleted
:-) (*hint towards the general direction of the Experimental vs. Draft
discussion*).
PART C
I’m still strongly against putting any markup in the <body/> directly as an
replacement for XHTML-IM [2]. The reason for this is that text-based markup is
inherently hard to extend: Whenever you do some extension, you have to give a
(sequence of) characters which formerly had no special meaning some special
meaning (e.g. if we had Markdown without bold/strong before, we’d re-define
**...** to mean boldface, where it had no special meaning before).
I think we should not be putting markup of any kind in <body/> for this
reason. This is unfortunate, and the only way out is to have a well-defined
syntax which clearly distinguishes meta-symbols and normal symbols of text.
Of course, a proposal like the ProtoXEP by Florian [3] (which is soon to be
announced) helps with that because it allows us to version things. I’m however
concerned how well that would work in practice (but I’ll raise those concerns
when that ProtoXEP is discussed).
PART D
Instead I propose that we find or standardise an extensible markup lanugage
which works and serves as a protocol break for HTML. This means to make a list
of things we want to have as markup and survey existing standards. I’ll start
here with my wishlist and things I’ve seen other say:
Non-semantic things:
- Colorise things with colors from a palette (e.g. 360 colors from the
XEP-0392 palette) (via Georg Lukas)
Semantic things:
- Emphasis (typically italics), Strong (typically boldface)
- Pre-formatted text (code), including a way to specify the language (this may
make the colorisation of things obsolete, the only use-case I’ve heard so far
is syntax-highlighting)
- Blockquotes
- Paragraphs
- Enumerations and unordered lists
- Links (but possibly without the possibility to change the text shown), with
a whitelist of URL schemas
Other things which may be useful, but I’m not sure if we don’t have better
ways to do that by now:
- Embedding multimedia content, like images, audio, video (a SIMS-message may
be better suited for that)
Once we settle for a list of things we want to have, I’ll be happy to
investigate which markups would be suitable or could easily be extended for
our purposes.
Sorry for the long email.
kind regards,
Jonas
[1]: As I can’t find a definition quickly online, this is how Dave
described it:
> A protocol break is a security programming technique where you
> extract out the information into a different, fixed, form and then
> reconstitute it entirely from scratch. Means that, in our case, a
> bit of Javascript has no place in the intermediate form so cannot
> pass through.
[2]: I can see that there are advantages to doing this, and I can see both
co-exist.
[3]: https://github.com/xsf/xeps/pull/529
[*]: it will not prevent attacks against other stupidity such as double-
unescaping things (e.g. if somebody sends <script> and somebody
manages to put that into e.g. .innerHTML after unescaping)
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
