https://bugzilla.wikimedia.org/show_bug.cgi?id=26304

--- Comment #15 from Brion Vibber <[email protected]> 2011-07-02 18:37:45 UTC 
---
I have no server access so couldn't tell you exactly what's being recorded now,
but when pulling some of the older links shown above (for enwiki & commons)
using the testing password I end up receiving what at first looks like a legit
OAI-PMH XML response, but partway through after the end of one of the
<Record>...</Record> elements suddenly the Wikimedia error page shows up like
this:

... ^ OAI-PMH with embedded MediaWiki export XML
  </page>
</mediawiki>
</metadata>
</record>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd";>

<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
 <head>

  <title>Wikimedia Error</title>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  <meta name="author" content="Mark Ryan, with translation by many people; see
http://meta.wikimedia.org/wiki/Multilingual_error_messages"/>
  <meta name="copyright" content="(c) 2005-2007 Mark Ryan and others. Text
licensed under the GNU Free Documentation License.
http://www.gnu.org/licenses/fdl.txt"/>
... v HTML error page

The OAI output, like Special:Export disables the regular output stuff and
possibly the regular output buffering as well, so it might be that some error
is triggering that error page (have we modified PHP to output that error page
directly on a fatal error, perhaps?)


Taking a quick look over the current OAI extension code, I don't see any
explicit use of UtfNormal::cleanUp(); though it'll be implicitly called already
by the xmlsafe() string escaping wrapper in WikiExport common code & a few bits
in OAIRepo itself.

A very large page with a lot of non-ASCII characters could perhaps be an
individual thing that eats up a lot of memory in one chunk; it's also possible
there's a memory leak I haven't noticed before.

Could try tweaking the number in OAIRepo::chunkSize() down -- this'll make it
run fewer pages through each ListRecords call, and might help if there's a leak
forcing memory usage up from record to record -- but *won't* help if it's dying
on an individual page record being too-big.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to