The dump doesn't include deleted pages or revisions. The dump has the values
but the parser doesn't parse them.

2010/10/1 Dr. Trigon <[email protected]>

> May be I am wrong, but xqt told me once that the PreloadingGenerator
> has problems with API. I myself had problems due to deleted (and re-
> direct) pages with API loading multiple pages at once too.
>
> So my assumption is, this xml parser has indeed problem parsing the
> deleted (and maybe redirect) pages and thus fails to return them all
> and so the PreloadingGenerator does not work with API.
> If I am right with this, the solution to the problem mentioned here
> can also solve the Preloading with API problem. This would be very
> nice! But the be sure I would appreciate a comment by xqt on this ;))
>
> Just some thoughts...
>
> Greetings
> DrTrigon
>
>
> Am 01.10.2010 00:52, schrieb Dmitry Chichkov:
> > I see. Strange... That indeed looks like a parser bug.
> >
> > -- Dmitry
> >
> >
> > On Thu, Sep 30, 2010 at 3:37 PM, emijrp <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Furthermore, if you see the chunk of the dump that I have posted,
> >     the page title and page id are there. But the parser doesn't get
> them.
> >
> >     2010/10/1 emijrp <[email protected] <mailto:[email protected]>>
> >
> >         Hi, thanks for your quick response, but I have a question. Why
> >         are deleted pages included in the dump? Also, the page of the
> >         error is not deleted in the wiki.[1]
> >
> >         [1] http://kw.wikipedia.org/wiki/Rebellyans_Kernow_1497
> >
> >         2010/9/30 Dmitry Chichkov <[email protected]
> >         <mailto:[email protected]>>
> >
> >             Hi Emijrp,
> >
> >             That's "normal". Page id/title can be None/empty for deleted
> >             pages.
> >
> >             -- Regards, Dmitry
> >
> >
> >             On Thu, Sep 30, 2010 at 9:50 AM, emijrp <[email protected]
> >             <mailto:[email protected]>> wrote:
> >
> >                 Hi all;
> >
> >                 I think that there is an error in xmlreader.py. When
> >                 parsing a full revision XML (in this case[1]), using
> >                 this code[2] (look at the try-catch, it writes when
> >                 fails) I get correctly username, timestamp and
> >                 revisionid, but sometimes, the page title and the page
> >                 id are None or empty string.
> >
> >                 The first error is:
> >                 ['', None, 'QuartierLatin1968', '2004-10-10T04:24:14Z',
> >                 '4267']
> >
> >                 But if we do:
> >                 7za e -bd -so kwwiki-20100926-pages-meta-history.xml.7z
> >                 2>/dev/null | egrep -i '2004-10-10T04::14Z' -C20
> >
> >                 We get this[3], which is OK, the page title and the page
> >                 id are available in the XML, but not correctly parsed.
> >                 And this is not the only page title and page it that
> fails.
> >
> >                 Perhaps I have missed something, because I'm learning to
> >                 parsing XML. Sorry in that case.
> >
> >                 Regards,
> >                 emijrp
> >
> >                 [1]
> >
> http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z
> >                 [2] http://pastebin.ca/1951930
> >                 [3] http://pastebin.ca/1951937
> >
> >                 _______________________________________________
> >                 Pywikipedia-l mailing list
> >                 [email protected]
> >                 <mailto:[email protected]>
> >
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> >
> >
> >
> >             _______________________________________________
> >             Pywikipedia-l mailing list
> >             [email protected]
> >             <mailto:[email protected]>
> >             https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> >
> >
> >
> >
> >     _______________________________________________
> >     Pywikipedia-l mailing list
> >     [email protected]
> >     <mailto:[email protected]>
> >     https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> >
> >
> >
> >
> > _______________________________________________
> > Pywikipedia-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to