I like Rob's idea of an exclusion list specifically formatted for interesting sites. Shades of SiteScooper!
I've been of the opinion that fetching a set of pages from various web-sites is not the most interesting part of the parser, and that such magic more properly belongs in a system like SiteScooper, which would provide a set of pages for Plucker to operate on. But I'd like to suggest that if we continue to frobify the fetching like this that we move that part of the parser logic to a separate file and set of classes, and take it out of the Spider.py code. There's another thing that makes easier too, the recursive parsing of pages, necessary for implementing the <OBJECT> tag. Bill
