> cElementTree. What are your versions? Python 2.6.4 (r264:75706, Jun 4 2010, 18:20:31)
(on f13 x64) Greetings Am 06.10.2010 08:40, schrieb emijrp: > I have tested your code, with the bz2 and 7z dumps, and I get titles > with None value. The first one is the same error that apperas in my code. > > Reading XML dump... > None 2004-10-10T04:24:14Z > > I have the last version of pywikipediabot and Python 2.6.5 (r265:79063, > Apr 16 2010, 13:09:56). Probably, it can be a error of Python or > cElementTree. What are your versions? > > 2010/10/5 Russell Blau <[email protected] <mailto:[email protected]>> > > "emijrp" <[email protected] <mailto:[email protected]>> wrote in message > news:[email protected]... > > > I think that there is an error in xmlreader.py. When parsing a full > > revision XML (in this case[1]), using this code[2] (look at the > > try-catch, it writes when fails) I get correctly username, > > timestamp and revisionid, but sometimes, the page title and the page > > id are None or empty string. > > > [1] > > > > http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z > > [2] http://pastebin.ca/1951930 > > [3] http://pastebin.ca/1951937 > > I have been completely unable to replicate this supposed error. I > downloaded the same kwwiki dump file that you referenced. I loaded > it with > xmlreader.XmlDump, ran it through the parser, and counted the number of > XMLEntry objects it generated: 4711. Then as a test I opened the > same dump > as a text file and counted the number of lines that contain the string > "<page>": 4711. So the parser is correctly returning one object per > page > item found in the file. > > Next I ran the parser again with a script that would print out a > message if > any XMLEntry object had a missing title (None or empty string); no > messages. > > Then I searched for the specific page entry you showed in your > pastebin item > [3]. The result of this test is shown at [4]. In short, it found > exactly the > page title you said was missing. > > I cannot explain why your results are different than mine, unless > perhaps > you have a corrupted copy of the dump file, or are not using the current > version of xmlreader.py. > > Russ > > [4] http://pastebin.ca/1955170 > > > > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > > > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
