https://bugzilla.wikimedia.org/show_bug.cgi?id=26304
--- Comment #15 from Brion Vibber <[email protected]> 2011-07-02 18:37:45 UTC --- I have no server access so couldn't tell you exactly what's being recorded now, but when pulling some of the older links shown above (for enwiki & commons) using the testing password I end up receiving what at first looks like a legit OAI-PMH XML response, but partway through after the end of one of the <Record>...</Record> elements suddenly the Wikimedia error page shows up like this: ... ^ OAI-PMH with embedded MediaWiki export XML </page> </mediawiki> </metadata> </record> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Wikimedia Error</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <meta name="author" content="Mark Ryan, with translation by many people; see http://meta.wikimedia.org/wiki/Multilingual_error_messages"/> <meta name="copyright" content="(c) 2005-2007 Mark Ryan and others. Text licensed under the GNU Free Documentation License. http://www.gnu.org/licenses/fdl.txt"/> ... v HTML error page The OAI output, like Special:Export disables the regular output stuff and possibly the regular output buffering as well, so it might be that some error is triggering that error page (have we modified PHP to output that error page directly on a fatal error, perhaps?) Taking a quick look over the current OAI extension code, I don't see any explicit use of UtfNormal::cleanUp(); though it'll be implicitly called already by the xmlsafe() string escaping wrapper in WikiExport common code & a few bits in OAIRepo itself. A very large page with a lot of non-ASCII characters could perhaps be an individual thing that eats up a lot of memory in one chunk; it's also possible there's a memory leak I haven't noticed before. Could try tweaking the number in OAIRepo::chunkSize() down -- this'll make it run fewer pages through each ListRecords call, and might help if there's a leak forcing memory usage up from record to record -- but *won't* help if it's dying on an individual page record being too-big. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
