https://bugzilla.wikimedia.org/show_bug.cgi?id=57236
Web browser: ---
Bug ID: 57236
Summary: mwdumper fails to import English wikipedia dump
Product: Tools
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: blocker
Priority: Unprioritized
Component: mwdumper
Assignee: [email protected]
Reporter: [email protected]
Classification: Unclassified
Mobile Platform: ---
Hello
I'm trying to use mwdumper to import the latest English Wikipedia dump
(enwiki-20131104-pages-articles.xml). It fails with the following error:
10á045á000 pages (1á658,325/sec), 10á045á000 revs (1á658,325/sec)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
nown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Sour
ce)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
at org.mediawiki.dumper.Dumper.main(Dumper.java:142)
ERROR 1064 (42000) at line 79047: You have an error in your SQL syntax; check
th
e manual that corresponds to your MySQL server version for the right syntax to
u
se near ''{{Infobox military person\n|name=Alexander Holle\n|birth_date=27
Febru
ary 1898\' at line 1
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l