Re: [UM-LINUX] web crawlers?

Rob Mon, 03 Apr 2006 14:18:32 -0700

On Mon, Apr 03, 2006 at 04:57:09PM -0400, Phil Anderson wrote:
> wget can do pretty much anything relating to web crawling, it's got a ton of
> options. it might be kind of annoying, but there should be some tutorials
> out there to simplify what you want to do... it's pretty much the
> swiss-army-knife of web crawling tools.


Dude... I had no idea... I use wget all of time... even in mirror mode
and recursive modes, but apparently it can  respect robots.txt and does
rate-limiting both in terms of bandwidth and queries/second which is
really what I wanted.  It even lets me add extra header entries in the
get request which I figured I would have to hack in myself...

Muchos gracias,

- Rob
.

Re: [UM-LINUX] web crawlers?

Reply via email to