[Pywikipedia-l] Feature request for pagegenerators.RegexFilterPageGenerator

Dr. Trigon Sat, 18 Sep 2010 15:06:30 -0700

Hello all

I'd like to suggest the code change given by the attached patch.


The idea is to change RegexFilterPageGenerator a little bit. First
change the 'regex' param to a list of regex(es) instead of 1 single.
The whole list of regex will be checked for a positive match. The
second change involves a new parameter 'invert' which, if set to
True changes the generator from returning pages on ANY POSITIVE match
to return page on NO POSITIVE match AT ALL. This way a positive
(additive) and negative (subtractive) filter behaviour can be achieved.

This would also be very helpful for my bot... ;)

Thanks a lot and greetings
DrTrigon

Index: pagegenerators.py
===================================================================
--- pagegenerators.py   (Revision 8572)
+++ pagegenerators.py   (Arbeitskopie)
@@ -573,7 +573,7 @@
                 regex = pywikibot.input(u'What page names are you looking 
for?')
             else:
                 regex = arg[12:]
-            gen = RegexFilterPageGenerator(site.allpages(), regex)
+            gen = RegexFilterPageGenerator(site.allpages(), [regex])
         elif arg.startswith('-yahoo'):
             gen = YahooSearchPageGenerator(arg[7:])
         elif arg.startswith('-'):
@@ -1160,16 +1160,31 @@
             seenPages[_page] = True
             yield page
 
-def RegexFilterPageGenerator(generator, regex):
+def RegexFilterPageGenerator(generator, regex, invert=False):
     """
     Wraps around another generator. Yields only those pages, the titles of
-    which are positively matched to regex.
+    which are positively matched to any regex in list. If invert is False,
+    yields all pages matched by any regex, if True, yields all pages matched
+    none of the regex.
     """
-    reg = re.compile(regex, re.I)
+    reg = [ re.compile(r, re.I) for r in regex ]
 
     for page in generator:
-        if reg.match(page.titleWithoutNamespace()):
-            yield page
+        if invert:
+            # yield page if NOT matched by all regex
+            skip = False
+            for r in reg:
+                if r.match(page.titleWithoutNamespace()):
+                    skip = True
+                    break
+            if not skip:
+                yield page
+        else:
+            # yield page if matched by any regex
+            for r in reg:
+                if r.match(page.titleWithoutNamespace()):
+                    yield page
+                    break
 
 def CombinedPageGenerator(generators):
     """

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

[Pywikipedia-l] Feature request for pagegenerators.RegexFilterPageGenerator

Reply via email to