Re: [UM-LINUX] web crawlers?

J. Milgram Mon, 03 Apr 2006 14:40:09 -0700

While on the topic, don't forget wget --delete-after for preloading a
proxy cache. And -k for converting links to local. What a tool.


On Mon, 3 Apr 2006, Rob wrote:

On Mon, Apr 03, 2006 at 04:57:09PM -0400, Phil Anderson wrote:

wget can do pretty much anything relating to web crawling, it's got a ton of
options. it might be kind of annoying, but there should be some tutorials
out there to simplify what you want to do... it's pretty much the
swiss-army-knife of web crawling tools.


Dude... I had no idea... I use wget all of time... even in mirror mode
and recursive modes, but apparently it can  respect robots.txt and does
rate-limiting both in terms of bandwidth and queries/second which is
really what I wanted.  It even lets me add extra header entries in the
get request which I figured I would have to hack in myself...

Muchos gracias,

- Rob
.

Re: [UM-LINUX] web crawlers?

Reply via email to