Re: [Pywikipedia-l] XMLreader.py

Dr. Trigon Wed, 06 Oct 2010 13:55:20 -0700

 > cElementTree. What are your versions?

Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:31)


(on f13 x64)

Greetings


Am 06.10.2010 08:40, schrieb emijrp:
> I have tested your code, with the bz2 and 7z dumps, and I get titles
> with None value. The first one is the same error that apperas in my code.
>
> Reading XML dump...
> None 2004-10-10T04:24:14Z
>
> I have the last version of pywikipediabot and Python 2.6.5 (r265:79063,
> Apr 16 2010, 13:09:56). Probably, it can be a error of Python or
> cElementTree. What are your versions?
>
> 2010/10/5 Russell Blau <[email protected] <mailto:[email protected]>>
>
>     "emijrp" <[email protected] <mailto:[email protected]>> wrote in message
>     news:[email protected]...
>
>      > I think that there is an error in xmlreader.py. When parsing a full
>      > revision XML (in this case[1]), using this code[2] (look at the
>      > try-catch, it writes when fails) I get correctly username,
>      > timestamp and revisionid, but sometimes, the page title and the page
>      > id are None or empty string.
>
>      > [1]
>      >
>     
> http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z
>      > [2] http://pastebin.ca/1951930
>      > [3] http://pastebin.ca/1951937
>
>     I have been completely unable to replicate this supposed error.  I
>     downloaded the same kwwiki dump file that you referenced.  I loaded
>     it with
>     xmlreader.XmlDump, ran it through the parser, and counted the number of
>     XMLEntry objects it generated: 4711.  Then as a test I opened the
>     same dump
>     as a text file and counted the number of lines that contain the string
>     "<page>": 4711.  So the parser is correctly returning one object per
>     page
>     item found in the file.
>
>     Next I ran the parser again with a script that would print out a
>     message if
>     any XMLEntry object had a missing title (None or empty string); no
>     messages.
>
>     Then I searched for the specific page entry you showed in your
>     pastebin item
>     [3]. The result of this test is shown at [4]. In short, it found
>     exactly the
>     page title you said was missing.
>
>     I cannot explain why your results are different than mine, unless
>     perhaps
>     you have a corrupted copy of the dump file, or are not using the current
>     version of xmlreader.py.
>
>     Russ
>
>     [4] http://pastebin.ca/1955170
>
>
>
>
>     _______________________________________________
>     Pywikipedia-l mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l


_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Re: [Pywikipedia-l] XMLreader.py

Reply via email to