https://bugzilla.wikimedia.org/show_bug.cgi?id=17486
--- Comment #21 from Aryeh Gregor <[email protected]> 2010-01-10 17:23:50 UTC --- (In reply to comment #17) > I was thinking in a box below the message text. I don't think we have messages > where > unclosed tags are reasonable. At most, it would be an unsupported hack. We have so many messages that I wouldn't bet on that. But we can raise the warning anyway, if it's only a warning. > Matching only wikitext messages is an important point. It shouldn't warn on > html > messages or a javascript. Why not on HTML messages too, if they're not tidied? > The problem is not XML but having a syntax which we are unable to map into > the target language. XML is a necessary although not sufficient condition for this problem to be worth worrying about. If everyone used HTML5 parsers, the consumer would fix the markup in a standard way and it wouldn't be a big problem (although ideally we'd fix it anyway). (In reply to comment #18) > The proposed solution of using html5lib on the client side would help, if it > accepts slightly malformed input. html5lib will parse an arbitrary stream of bytes into a DOM. In practice, it will fix simple nesting errors invisibly -- <div><span>Foo</div></span> will become <div><span>Foo</span></div>. You can test out an HTML5 parser by getting a copy of Firefox 3.6 or later, setting html5.enable to true in about:config, loading the URL data:text/html,<!doctype html><div><span>Foo</div></span> and inspecting the resulting DOM from Firebug. This transformation is standardized in HTML5, so the DOM output is well-defined, and it's meant to be a close match to what all browsers already do anyway. Of course the markup won't validate, but that's not as big an issue as breaking bots that users rely on. Another thing we could consider eventually is using html5lib ourselves instead of Tidy. If it's fast enough (there's a C++ version that Mozilla uses), we could pass all our output through it to parse and reserialize. It's kind of silly for us to do this when it could more easily be done by the client, though. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
