| zhuyifei1999 added a comment. |
In T199615#4425487, @Xqt wrote:We could
- use hash function for the filter_unique key
- use hash function for the filter_unique key by default
- use a GeneratorFactory Container attribute to hold the seen pages which could be reused when we have more than one duplicate filter
LGTM.
- use an container which uses disk space instead of memory (but this could be time consuming)
This could increase the amount of entries that we can store, but creates more things to consider:
- file permissions (600 or 644?)
- location & uniqueness of file (use tempfile?)
- access by memory-map (causes a lot of page faults) or file objects (causes a lot of syscalls and context switches)?
- and most importantly: do we have to make our own implementation of set()?
TASK DETAIL
EMAIL PREFERENCES
To: zhuyifei1999
Cc: zhuyifei1999, Xqt, Aklapper, matej_suchanek, pywikibot-bugs-list, Magul, Tbscho, MayS, Mdupont, JJMC89, Avicennasis, mys_721tx, jayvdb, Dalba, Masti, Alchimista, Rxy
Cc: zhuyifei1999, Xqt, Aklapper, matej_suchanek, pywikibot-bugs-list, Magul, Tbscho, MayS, Mdupont, JJMC89, Avicennasis, mys_721tx, jayvdb, Dalba, Masti, Alchimista, Rxy
_______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
