I am looking for a spider/gatherer with the following characteristics:
* Enables the control of the crawling process by URL substring/regexp
and HTML context of the link.
* Enables the control of the gathering (i.e. saving) processes by URL
substring/regexp, MIME type, other header information and ideally by some
predicates on the HTML source.
* Some way to save page/document metadata, ideally in a database.
* Freeware, shareware or otherwise inexpensive would be nice.
Thanks in advance for any help.
-Mark