https://bugzilla.wikimedia.org/show_bug.cgi?id=27478

--- Comment #13 from Aryeh Gregor <[email protected]> 2011-02-24 
18:12:08 UTC ---
(In reply to comment #11)
> You explained exactly the same error with scraping in 2009:
> [[Wikipedia:Village pump (technical)/Archive 67#Twinkle stalling]]
> 
> Also bug 27672 was filed yesterday.

This suggests maybe some named entities have crept through, or some other type
of well-formedness.  It would be nice if people said which exact pages failed,
but it would probably be possible to figure it out.  I'm guessing it's the
result of messages being passed as raw HTML and sysops adding named entities to
them, but it could be something else too.

The easy way out would be to restore the old hack where we serve HTML5 with an
HTML 4.01 Strict doctype, which is valid HTML5 but rather confusing.  This is
how 1.16 works by default.  That way a DTD is specified, which means that
non-browser UAs will parse named entities successfully.  We can consider
switching back to the HTML5 doctype later.

(In reply to comment #12)
> Several problems on enwiki were caused by the difference in Sanitize::escapeId
> between HTML4 and HTML5 modes.

Hmm.  This should be disable-able by setting $wgExperimentalHtmlIds to false,
leaving $wgHtml5 true (which might leave well-formedness issues).  A proper fix
will require some more thought, though.  The changes to escapeId() are really
meant for headings, but we can't realistically distinguish wikilinks meant to
point at headings from wikilinks meant to point at other things.

In practice, it looks like Cite is the major problem here (with the id's), and
it can probably be fixed.  My first inclination is to just generate arbitrary
id's for named refs instead of trying to key off the names.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to