URL pattern seemed to have temporary stopped working for a while. The bug is here:
http://www.gnu-designs.com/bugs/view_bug_page.php?f_id=0000285
Attached is a patch to allow url_pattern to correctly filter, when specified as a
commandline
key.
Bill, did you want to check this over, and see if this is the right solution, and
patch the CVS
branches if you agree with it?
The patch doesn't have many lines. There is only an addition of a few lines which
allow
url_pattern to be collected and put as an attribute of a SpiderLink tag.
The addition of these lines makes url_pattern in the config file work for me (using
wired as an
example test).
Best wishes,
Robert
cvs diff -u2 Spider.py
Index: Spider.py
===================================================================
RCS file: /cvs/plucker/plucker_src/parser/python/PyPlucker/Spider.py,v
retrieving revision 1.70
diff -u -2 -r1.70 Spider.py
--- Spider.py 9 Jul 2002 15:45:03 -0000 1.70
+++ Spider.py 17 Sep 2002 22:58:56 -0000
@@ -99,4 +99,5 @@
"stay_on_host",
"staybelow",
+ "url_pattern",
"current_depth",
"noimages",
@@ -177,4 +178,5 @@
self._maxheight = None
self._bpp = 1
+ self._url_pattern = None
old = dict.copy()
for key in old.keys():
@@ -200,4 +202,5 @@
if self._stay_below:
res = res + (" STAYBELOW=\"%s\"" % self._stay_below)
+ res = res + " URL_PATTERN='%s'" % self._url_pattern
res = res + " URL='%s'" % self._url
res = res + " " + repr (self._dict)