https://bugzilla.wikimedia.org/show_bug.cgi?id=32439

       Web browser: ---
             Bug #: 32439
           Summary: java.sql.SQLException: Incorrect string value:
                    '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at
                    row 9
           Product: mwdumper
           Version: unspecified
          Platform: PC
        OS/Version: Windows 7
            Status: NEW
          Severity: blocker
          Priority: Unprioritized
         Component: general
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


the dump file i'm reading is : 
enwiki-latest-pages-articles.xml.bz2(aug 08,2011)

i'm inserting the values into mysql db according do the wiki sql db definition,
after i removed the tables indexes constraints.

i will be more then glad to know if there's a way to work around it, and ignore
the problematic rows and continue reading, and writing  the rest of the file.

thank







2,260,000 pages (36.843/sec), 2,260,000 revs (36.843/sec)
2,261,000 pages (36.842/sec), 2,261,000 revs (36.842/sec)
2,262,000 pages (36.841/sec), 2,262,000 revs (36.841/sec)
2,263,000 pages (36.839/sec), 2,263,000 revs (36.839/sec)
2,264,000 pages (36.837/sec), 2,264,000 revs (36.837/sec)
2,265,000 pages (36.838/sec), 2,265,000 revs (36.838/sec)
java.io.IOException: java.sql.SQLException: Incorrect string value:
'\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
    at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92)
    at org.mediawiki.dumper.gui.DumperGui$1.run(DumperGui.java:206)
Caused by: org.xml.sax.SAXException: java.sql.SQLException: Incorrect string
value: '\xF0\x9D\x9E\xB1_\xF0...' for column 'page_title' at row 9
    at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:227)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
    at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
    at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
    at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to