gerritbot added a comment.
Change 454198 merged by jenkins-bot:
[pywikibot/core@master] pagegenerators.py: Avoid applying two uniquifying filters
https://gerrit.wikimedia.org/r/454198TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
gerritbot added a comment.
Change 454198 had a related patch set uploaded (by Dalba; owner: dalba):
[pywikibot/core@master] pagegenerators.py: Avoid applying two uniquifying filters
https://gerrit.wikimedia.org/r/454198TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
Dalba added a comment.
In T199615#4511856, @matej_suchanek wrote:
Another problem I can see is that filter_unique (inside GenFact it is self._filter_unique) is used twice: it's provided to RecentChangesPageGenerator (line 821) and also used for dupfiltergen = self._filter_unique(gensList) (line
gerritbot added a comment.
Change 445854 abandoned by Xqt:
[IMPR] Use hash key for unique filter by default
Reason:
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/ /451824/
https://gerrit.wikimedia.org/r/445854TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
gerritbot added a comment.
Change 451824 merged by jenkins-bot:
[pywikibot/core@master] Use a key for filter_unique where appropriate
https://gerrit.wikimedia.org/r/451824TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
gerritbot added a comment.
Change 451824 had a related patch set uploaded (by Dalba; owner: dalba):
[pywikibot/core@master] Use a key for filter_unique where appropriate
https://gerrit.wikimedia.org/r/451824TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
gerritbot added a comment.
Change 445854 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Use hash key for filter_unique by default
https://gerrit.wikimedia.org/r/445854TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL
zhuyifei1999 added a comment.
In T199615#4425487, @Xqt wrote:
We could
use hash function for the filter_unique key
use hash function for the filter_unique key by default
use a GeneratorFactory Container attribute to hold the seen pages which could be reused when we have more than one duplicate
Xqt added a comment.
We could
use hash function for the filter_unique key
use hash function for the filter_unique key by default
use a GeneratorFactory Container attribute to hold the seen pages which could be reused when we have more than one duplicate filter
use an container which uses disk
zhuyifei1999 added a comment.
In T199615#4425451, @Xqt wrote:
I see the getsizeof() counts the pointers only but not the Page objects itself.
The underlying implementation of set.__sizeof__ looks weird to me. It doesn't seem to be iterative or recursive on a first glance.
But yes, the problem
Xqt added a comment.
No glue where the Memory leakage might come from.
I see the getsizeof() counts the pointers only but not the Page objects itself.TASK DETAILhttps://phabricator.wikimedia.org/T199615EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: XqtCc:
Xqt added a comment.
Long-running tasks may end on MemoryError to filter_unique leaks Memory
Why do you assume that?
Try:
from sys import getsizeof
import pwb, pywikibot as py
from pywikibot.tools import filter_unique as f
s = py.Site()
p = py.Page(s, 'Hydraulik')
container = set()
gen =
12 matches
Mail list logo