I am looking for a spider/gatherer with the following characteristics:
    * Enables the control of the crawling process by URL substring/regexp
and HTML context of the link.
    * Enables the control of the gathering (i.e. saving) processes by URL
substring/regexp, MIME type, other header information and ideally by some
predicates on the HTML source.
    * Some way to save page/document metadata, ideally in a database.
    * Freeware, shareware or otherwise inexpensive would be nice.
Thanks in advance for any help.

-Mark

Reply via email to