matej_suchanek created this task.
matej_suchanek added a project: Pywikibot-core.
Herald added subscribers: pywikibot-bugs-list, Aklapper.

TASK DESCRIPTION

If you have a bot running a large generator (eg. -recentchanges on #Wikidata) and need to load every page into memory, it may eventually fail on MemoryError because filter_unique maintains a growing set of already visited pages (for instance, my bot on #toolforge visited 220,000 items before crashing).

I think this could be improved by calling BasePage.clear_cache() somewhere inside BaseBot.run or GeneratorFactory. Because the only thing we have to know when the page is dormant and sitting in the set is the hash, which doesn't need to know the content.


TASK DETAIL
https://phabricator.wikimedia.org/T199615

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: Aklapper, matej_suchanek, pywikibot-bugs-list, Magul, Tbscho, MayS, Mdupont, JJMC89, Avicennasis, mys_721tx, jayvdb, Dalba, Masti, Alchimista, Rxy
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to