https://bugzilla.wikimedia.org/show_bug.cgi?id=32439
Web browser: ---
Bug #: 32439
Summary: java.sql.SQLException: Incorrect string value:
'\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at
row 9
Product: mwdumper
Version: unspecified
Platform: PC
OS/Version: Windows 7
Status: NEW
Severity: blocker
Priority: Unprioritized
Component: general
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
the dump file i'm reading is :
enwiki-latest-pages-articles.xml.bz2(aug 08,2011)
i'm inserting the values into mysql db according do the wiki sql db definition,
after i removed the tables indexes constraints.
i will be more then glad to know if there's a way to work around it, and ignore
the problematic rows and continue reading, and writing the rest of the file.
thank
2,260,000 pages (36.843/sec), 2,260,000 revs (36.843/sec)
2,261,000 pages (36.842/sec), 2,261,000 revs (36.842/sec)
2,262,000 pages (36.841/sec), 2,262,000 revs (36.841/sec)
2,263,000 pages (36.839/sec), 2,263,000 revs (36.839/sec)
2,264,000 pages (36.837/sec), 2,264,000 revs (36.837/sec)
2,265,000 pages (36.838/sec), 2,265,000 revs (36.838/sec)
java.io.IOException: java.sql.SQLException: Incorrect string value:
'\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
at org.mediawiki.dumper.gui.DumperGui$1.run(DumperGui.java:206)
Caused by: org.xml.sax.SAXException: java.sql.SQLException: Incorrect string
value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l