On 4/3/06, Rob <[EMAIL PROTECTED]> wrote:
On Mon, Apr 03, 2006 at 04:57:09PM -0400, Phil Anderson wrote:
> wget can do pretty much anything relating to web crawling, it's got a ton of
> options. it might be kind of annoying, but there should be some tutorials
> out there to simplify what you want to do... it's pretty much the
> swiss-army-knife of web crawling tools.
Dude... I had no idea... I use wget all of time... even in mirror mode
and recursive modes, but apparently it can respect robots.txt and does
rate-limiting both in terms of bandwidth and queries/second which is
really what I wanted. It even lets me add extra header entries in the
get request which I figured I would have to hack in myself...
Muchos gracias,
- Rob
.
--
Christopher Conroy
