On Mon, Apr 03, 2006 at 04:57:09PM -0400, Phil Anderson wrote: > wget can do pretty much anything relating to web crawling, it's got a ton of > options. it might be kind of annoying, but there should be some tutorials > out there to simplify what you want to do... it's pretty much the > swiss-army-knife of web crawling tools.
Dude... I had no idea... I use wget all of time... even in mirror mode and recursive modes, but apparently it can respect robots.txt and does rate-limiting both in terms of bandwidth and queries/second which is really what I wanted. It even lets me add extra header entries in the get request which I figured I would have to hack in myself... Muchos gracias, - Rob .
