Thank you for your advice! I wrote a simple method which returns a set of titles of pages that have been changed since "limit": http://paste.pocoo.org/show/547240/. It does not return the exact set (for this I think I would have to check the timestamps in the last iteration), therefore in the worst case it would return 100 unwanted titles, but this is not a problem for my purposes.
Cheers alkamid On 6 February 2012 02:27, Morten Wang <[email protected]> wrote: > To me the implementation depends on what alkamid actually wants to do. > For keeping some of SuggestBot's data sources up-to-date I use the > site object's recentchanges() generator to grab data (and although one > can only get a limited amount at each step, I've never had troubles > exhausting the generator), where it's easy to check the edit timestamp > to stop iterating when necessary. I then store page titles in a > set(), which can be fed to a PagesFromTitlesGenerator, and I chain > said generator with a PreloadingGenerator to get the latest revisions. > > In my experience only a minority of a Wikipedia edition's articles are > updated on a weekly basis, so using allpages() results in a lot of > unnecessary data. > > > Cheers, > Morten > > On 5 February 2012 17:28, Dr. Trigon <[email protected]> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >>> past week? I thought of using the AllPagesPageGenerator and >>> executing editTime() on each page, but this method gives me only >>> zeros if the page was not read before (e.g. I have to call >>> page.get() first in order for editTime() to work properly). Is >>> there any edit-time-related piece of information I can get from a >>> generated list of pages? Or maybe there is another page generator >>> suitable for me? >> >> Everything using 'getall' from 'wikipedia.py' (imported as 'pywikibot') >> does give you the first history entry WITHOUT having to trigger >> page.get(). E.g. the 'PreloadingGenerator' and as you can chain the >> generators you can first setup your generator as 'gen1' and then pass >> 'gen1' to a 'PreloadingGenerator' (may be in a 'ThreadedGenerator'...) >> in order to get the first history entry of every page... In >> 'sum_disc.py' of the DrTrigonBot repo is an example for this. >> >> Greetings >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.12 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ >> >> iEYEARECAAYFAk8vEKcACgkQAXWvBxzBrDAMTwCfe7kKUHrtgsE+EguKAuiWoODb >> zr4An2M5d6G0XZJGMntDLS54DL6XGdug >> =37Hk >> -----END PGP SIGNATURE----- >> >> _______________________________________________ >> Pywikipedia-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
