URL pattern seemed to have temporary stopped working for a while. The bug is here:
http://www.gnu-designs.com/bugs/view_bug_page.php?f_id=0000285

Attached is a patch to allow url_pattern to correctly filter, when specified as a 
commandline 
key.

Bill, did you want to check this over, and see if this is the right solution, and 
patch the CVS 
branches if you agree with it? 

The patch doesn't have many lines. There is only an addition of a few lines which 
allow 
url_pattern to be collected and put as an attribute of a SpiderLink tag.

The addition of these lines makes url_pattern in the config file work for me (using 
wired as an 
example test).

Best wishes,
Robert
 


cvs diff -u2 Spider.py 
Index: Spider.py
===================================================================
RCS file: /cvs/plucker/plucker_src/parser/python/PyPlucker/Spider.py,v
retrieving revision 1.70
diff -u -2 -r1.70 Spider.py
--- Spider.py   9 Jul 2002 15:45:03 -0000       1.70
+++ Spider.py   17 Sep 2002 22:58:56 -0000
@@ -99,4 +99,5 @@
     "stay_on_host",
     "staybelow",
+    "url_pattern",
     "current_depth",
     "noimages",
@@ -177,4 +178,5 @@
         self._maxheight = None
         self._bpp = 1
+        self._url_pattern = None
         old = dict.copy()
         for key in old.keys():
@@ -200,4 +202,5 @@
         if self._stay_below:
             res = res + (" STAYBELOW=\"%s\"" % self._stay_below)
+        res = res + " URL_PATTERN='%s'" % self._url_pattern
         res = res + " URL='%s'" % self._url
         res = res + " " + repr (self._dict)

Reply via email to