https://bugzilla.wikimedia.org/show_bug.cgi?id=72886
Bug ID: 72886
Summary: Recent XML dump files break mwxml2sql
Product: Datasets
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: blocker
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Web browser: ---
Mobile Platform: ---
Dear Sir or Madam,
0) Context
`mwxml2sql' is a utility for rapidly converting published XML dump files into
SQL files for the `page', `revision', and `text' tables. These SQL files may
then be rapidly imported into a database.
1) Breaking change in XML dump file schema
XML dump files using schema `export-0.8.xsd' are processed by `mwxml2sql'.
XML dump files using schema `export-0.9.xsd' break `mwxml2sql'.
2) Example of error
(shell)$ rsync
ftpmirror.your.org::wikimedia-dumps/simplewiki/20141025/simplewiki-20141025-pages-meta-current.xml.bz2
.
(shell)$ rsync
ftpmirror.your.org::wikimedia-dumps/simplewiki/20141025/simplewiki-20141025-stub-meta-current.xml.gz
(shell)$ bzcat simplewiki-20141025-pages-meta-current.xml.bz2 | head -n 1
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.9/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.9/
http://www.mediawiki.org/xml/export-0.9.xsd" version="0.9" xml:lang="en">
(shell)$ /usr/bin/mwxml2sql --stubs
simplewiki-20141025-stub-meta-current.xml.gz --text
simplewiki-20141025-pages-meta-current.xml.bz2 --mysqlfile
simplewiki-20141025.gz --mediawiki 1.24 2>&1
WHINE: (none) no end siteinfo tag
WHINE: (none) no end siteinfo tag
3) Recent dumps
Wiki Date Schema mwxml2sql
simplewiki/20140220 0.8 OK
simplewiki/20140723 0.8 OK
simplewiki/20140814 0.8 OK
simplewiki/20140903 0.9 fail
simplewiki/20140927 0.9 fail
simplewiki/20141025 0.9 fail
Sincerely Yours,
Kent
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l