I am still able to import the dumps using the old mwDumper (modified to fix the contributor) and xml2SQL works also and it is quiet fast. importDump.php continues after it breaks I think.
bilal -- Verily, with hardship comes ease. On Thu, Feb 4, 2010 at 9:24 PM, Chad <[email protected]> wrote: > On Thu, Feb 4, 2010 at 9:12 PM, Eric Sun <[email protected]> wrote: > > Hi, > > > > I saw this thread back in October where someone was having trouble > > importing the English Wikipedia XML dump: > > http://lists.wikimedia.org/pipermail/wikitech-l/2009-October/045594.html > > The thread back in October seemed to end without resolution, and the > > tools still seem to be broken, so has anyone found a solution in the > > meantime? > > > > I'm using mediawiki-1.15.1 and attempting to import > > enwiki-20100130-pages-articles.xml.bz2. > > > > None of these options seem to work: > > 1) importDump.php > > fails by spewing "Warning: xml_parse(): Unable to call handler in_() > > in ./includes/Import.php on line 437" repeatedly > > > > 2) xml2sql (http://meta.wikimedia.org/wiki/Xml2sql): > > Fails with error: > > xml2sql: parsing aborted at line 33 pos 16. > > due to the new <redirect> tag introduced in the new dumps? > > > > 3) mwdumper (http://www.mediawiki.org/wiki/MWDumper): > > Current XML is schema v0.4, but the documentation says that it's for 0.3 > > > > 4) mwimport (http://meta.wikimedia.org/wiki/Data_dumps/mwimport): > > Fails immediately: > > siteinfo: untested generator 'MediaWiki 1.16alpha-wmf', expect trouble > ahead > > page: expected closing tag in line 35 > > > > Any tips? > > Thanks! > > Eric > > > > _______________________________________________ > > Wikitech-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > Most of these errors are caused by the new(ish) <redirect /> tag > within <page> elements. 0.4 is the correct version of the schema, > but unfortunately the schema was updated and dumps were > produced using them before the changes made it into a release. > > 1.15.1 cannot import pages with <redirect />, we should probably > backport that. That, and we should rewrite the importers to not barf > terribly when they encounter an unknown element. > > -Chad > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
