Dear list,

TL;DR: Four parts: (a) XHTML-IM is harmful and needs replacement, not fixing 
(so I changed my mind), (b) I think obsoleting without alternative is harmful, 
(c) Don’t put things in <body/>, (d) Thoughts about an alternative.


PART A

Okay, there has been some discussion in xsf@ yesterday which changed my mind a 
little. The key point which convinced me was that Dave brought up the concept 
of protocol breaks, and implied that a protocol break [1] is the only way to 
prevent this kind of injection attacks [*]. 

Now this makes a lot of sense, and I can see that this trumps the elegance of  
leveraging that we can embed XHTML semantics into the XML stream directly. So 
I’m now on the position that XHTML-IM is harmful (I’ve been there before, 
which is why I proposed fixes) *and* that we indeed might want to move to a 
different type of markup as intermediate representation of the protocol break.



PART B

I am still not keen on obsoleting XHTML-IM before we have an actual 
alternative ready. I don’t think that this will achieve anything good. 
Instead, I think that one of two things will happen:

(a) Clients continue to implement XHTML-IM because it is the only actual
    way to convey markup right now (this is what I’ll do until there’s a
    replacement).

(b) The ecosystem will fracture in islands of different, underspecified, 
    plain-text markups put in <body/>.

I don’t think either is particularly good. I also wonder what it would look 
like to have the only markup protocol with actual deployment being obsoleted 
:-) (*hint towards the general direction of the Experimental vs. Draft 
discussion*).


PART C

I’m still strongly against putting any markup in the <body/> directly as an 
replacement for XHTML-IM [2]. The reason for this is that text-based markup is 
inherently hard to extend: Whenever you do some extension, you have to give a 
(sequence of) characters which formerly had no special meaning some special 
meaning (e.g. if we had Markdown without bold/strong before, we’d re-define 
**...** to mean boldface, where it had no special meaning before).

I think we should not be putting markup of any kind in <body/> for this 
reason. This is unfortunate, and the only way out is to have a well-defined 
syntax which clearly distinguishes meta-symbols and normal symbols of text.

Of course, a proposal like the ProtoXEP by Florian [3] (which is soon to be 
announced) helps with that because it allows us to version things. I’m however 
concerned how well that would work in practice (but I’ll raise those concerns 
when that ProtoXEP is discussed).


PART D

Instead I propose that we find or standardise an extensible markup lanugage 
which works and serves as a protocol break for HTML. This means to make a list 
of things we want to have as markup and survey existing standards. I’ll start 
here with my wishlist and things I’ve seen other say:

Non-semantic things:

- Colorise things with colors from a palette (e.g. 360 colors from the 
XEP-0392 palette) (via Georg Lukas)

Semantic things: 

- Emphasis (typically italics), Strong (typically boldface)
- Pre-formatted text (code), including a way to specify the language (this may 
make the colorisation of things obsolete, the only use-case I’ve heard so far 
is syntax-highlighting)
- Blockquotes
- Paragraphs
- Enumerations and unordered lists
- Links (but possibly without the possibility to change the text shown), with 
a whitelist of URL schemas

Other things which may be useful, but I’m not sure if we don’t have better 
ways to do that by now:

- Embedding multimedia content, like images, audio, video (a SIMS-message may 
be better suited for that)

Once we settle for a list of things we want to have, I’ll be happy to 
investigate which markups would be suitable or could easily be extended for 
our purposes.


Sorry for the long email.

kind regards,
Jonas


   [1]: As I can’t find a definition quickly online, this is how Dave 
        described it:

        > A protocol break is a security programming technique where you 
        > extract out the information into a different, fixed, form and then 
        > reconstitute it entirely from scratch. Means that, in our case, a 
        > bit of Javascript has no place in the intermediate form so cannot 
        > pass through.

   [2]: I can see that there are advantages to doing this, and I can see both
        co-exist.

   [3]: https://github.com/xsf/xeps/pull/529

   [*]: it will not prevent attacks against other stupidity such as double-
        unescaping things (e.g. if somebody sends &lt;script&gt; and somebody
        manages to put that into e.g. .innerHTML after unescaping)

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Reply via email to