I'd say an HTML5 output mode *ought* to work like this:

*Don't try to be clever.*
* Consistency and predictability are key to both security review and data
consumability.

*Quote attributes consistently and predictably.*
* Always use double-quotes on attributes in output.

*Output specced empty tags in HTML style.*
* <img>, <hr>, <br> are fine and not ambiguous at all to an HTML parser.
There's no need to go adding a "/" in at the end!
* These are already whitelisted in the Html class so it's easy to not mess
this up.

*Don't do other silly things for old-school XHTML 1.*
* CDATA wrapping of <script>s and <style>s is not needed.

The only benefit of $wgWellFormedXml was that you could toss your
"well-formed" tag soup into an XML parser that didn't grok HTML. I have no
idea if that worked reliably or was actually useful to anyone, but it's
probably worth confirming that before actually removing the funky
self-closing tags.

-- brion


On Mon, May 2, 2016 at 11:42 AM, Brian Wolff <[email protected]> wrote:

> So currently, we have two ways of outputting html - $wgWellFormedXml =
> true (The default), outputs html that happens to conform with the
> rules of XML. $wgWellFormedXml = false on the other hand, uses more
> lax html5 rules to save a few bytes.
>
> Having two modes of output, feels rather silly to me. Originally I
> think this was meant as a feature flag well $wgWellFormedXml=false
> stabilized, but it never got turned on, and here we are 7 years later.
>
> Having $wgWellFormedXml=false increases the complexity of the code,
> and not all that many people use it (Notable exception is
> translatewiki). I think its important that security critical code be
> as simple as possible. Furthermore, there seems to be very little
> benefit to having the second mode (After you account for gzip, saving
> a few bytes from writing <img> instead of <img/> really doesn't
> matter, imo)
>
> With that in mind, I would like to propose killing $wgWellFormedXml =
> false; I'm not so much attached to the true mode (Although I do feel
> the true mode is significantly more sane), as I just simply want there
> to be a single mode. Putting the default to false was vetoed in
> T52040, so I think that true would be the best choice to go with going
> forward if we are getting rid of one of the modes.
>
> If there are aspects of the other mode that people really want, then I
> think we should simply merge that in to the default behavior instead
> of having two separate modes.
>
> See gerrit patch https://gerrit.wikimedia.org/r/286495 I would
> appreciate everyone's feedback.
>
> Thanks,
> Brian
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to