https://bugzilla.wikimedia.org/show_bug.cgi?id=17486





--- Comment #21 from Aryeh Gregor <[email protected]>  2010-01-10 
17:23:50 UTC ---
(In reply to comment #17)
> I was thinking in a box below the message text. I don't think we have messages
> where
> unclosed tags are reasonable. At most, it would be an unsupported hack.

We have so many messages that I wouldn't bet on that.  But we can raise the
warning anyway, if it's only a warning.

> Matching only wikitext messages is an important point. It shouldn't warn on
> html 
> messages or a javascript.

Why not on HTML messages too, if they're not tidied?

> The problem is not XML but having a syntax which we are unable to map into 
> the target language.

XML is a necessary although not sufficient condition for this problem to be
worth worrying about.  If everyone used HTML5 parsers, the consumer would fix
the markup in a standard way and it wouldn't be a big problem (although ideally
we'd fix it anyway).

(In reply to comment #18)
> The proposed solution of using html5lib on the client side would help, if it
> accepts slightly malformed input.

html5lib will parse an arbitrary stream of bytes into a DOM.  In practice, it
will fix simple nesting errors invisibly -- <div><span>Foo</div></span> will
become <div><span>Foo</span></div>.  You can test out an HTML5 parser by
getting a copy of Firefox 3.6 or later, setting html5.enable to true in
about:config, loading the URL

data:text/html,<!doctype html><div><span>Foo</div></span>

and inspecting the resulting DOM from Firebug.  This transformation is
standardized in HTML5, so the DOM output is well-defined, and it's meant to be
a close match to what all browsers already do anyway.  Of course the markup
won't validate, but that's not as big an issue as breaking bots that users rely
on.

Another thing we could consider eventually is using html5lib ourselves instead
of Tidy.  If it's fast enough (there's a C++ version that Mozilla uses), we
could pass all our output through it to parse and reserialize.  It's kind of
silly for us to do this when it could more easily be done by the client,
though.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to