https://bugzilla.wikimedia.org/show_bug.cgi?id=29564

--- Comment #7 from Marcin Cieślak <[email protected]> ---
1. I just checked the current dump and it looks like that it is not truncated
after the abovementioned page; but currently I can't find the page ID 803931
there. I'll double check that again, but simple pywikipedia loop:


Python 2.7.3 (default, Sep 17 2012, 21:25:11)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xmlreader
>>> z = xmlreader.XmlDump("huwiki-20121021-pages-articles.xml.bz2")
>>> for i in z.parse():
...     if i.id == 803931:
...             print repr(i)
...
Reading XML dump...

does not seem to give any results.

2. To fix this entry in the database I would simply remove the last byte of the
"thread_signature" field. Or maybe a whole greek text can be removed and
this:

[[User:Gubbubu|<font color="green" face="Lucida
calligraphy">Γουββος ΘιλοÎ

changed to

[[User:Gubbubu|Gubbubu]]

or something like that.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to