Hello all
I'd like to suggest the code change given by the attached patch.
The idea is to change RegexFilterPageGenerator a little bit. First
change the 'regex' param to a list of regex(es) instead of 1 single.
The whole list of regex will be checked for a positive match. The
second change involves a new parameter 'invert' which, if set to
True changes the generator from returning pages on ANY POSITIVE match
to return page on NO POSITIVE match AT ALL. This way a positive
(additive) and negative (subtractive) filter behaviour can be achieved.
This would also be very helpful for my bot... ;)
Thanks a lot and greetings
DrTrigon
Index: pagegenerators.py
===================================================================
--- pagegenerators.py (Revision 8572)
+++ pagegenerators.py (Arbeitskopie)
@@ -573,7 +573,7 @@
regex = pywikibot.input(u'What page names are you looking
for?')
else:
regex = arg[12:]
- gen = RegexFilterPageGenerator(site.allpages(), regex)
+ gen = RegexFilterPageGenerator(site.allpages(), [regex])
elif arg.startswith('-yahoo'):
gen = YahooSearchPageGenerator(arg[7:])
elif arg.startswith('-'):
@@ -1160,16 +1160,31 @@
seenPages[_page] = True
yield page
-def RegexFilterPageGenerator(generator, regex):
+def RegexFilterPageGenerator(generator, regex, invert=False):
"""
Wraps around another generator. Yields only those pages, the titles of
- which are positively matched to regex.
+ which are positively matched to any regex in list. If invert is False,
+ yields all pages matched by any regex, if True, yields all pages matched
+ none of the regex.
"""
- reg = re.compile(regex, re.I)
+ reg = [ re.compile(r, re.I) for r in regex ]
for page in generator:
- if reg.match(page.titleWithoutNamespace()):
- yield page
+ if invert:
+ # yield page if NOT matched by all regex
+ skip = False
+ for r in reg:
+ if r.match(page.titleWithoutNamespace()):
+ skip = True
+ break
+ if not skip:
+ yield page
+ else:
+ # yield page if matched by any regex
+ for r in reg:
+ if r.match(page.titleWithoutNamespace()):
+ yield page
+ break
def CombinedPageGenerator(generators):
"""
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l