https://bugzilla.wikimedia.org/show_bug.cgi?id=29564
--- Comment #7 from Marcin Cieślak <[email protected]> --- 1. I just checked the current dump and it looks like that it is not truncated after the abovementioned page; but currently I can't find the page ID 803931 there. I'll double check that again, but simple pywikipedia loop: Python 2.7.3 (default, Sep 17 2012, 21:25:11) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xmlreader >>> z = xmlreader.XmlDump("huwiki-20121021-pages-articles.xml.bz2") >>> for i in z.parse(): ... if i.id == 803931: ... print repr(i) ... Reading XML dump... does not seem to give any results. 2. To fix this entry in the database I would simply remove the last byte of the "thread_signature" field. Or maybe a whole greek text can be removed and this: [[User:Gubbubu|<font color="green" face="Lucida calligraphy">Γουββος ΘιλοÎ changed to [[User:Gubbubu|Gubbubu]] or something like that. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
