Re: [UM-LINUX] web crawlers?

Christopher Conroy Mon, 03 Apr 2006 14:37:40 -0700

I haven't used it myself but it's my understanding that curl offers some extra functionality that wget does not. Of course it all depends on what you want to do, but if you want to say archive an entire site, then Curl might be up your alley.

On 4/3/06, Rob <[EMAIL PROTECTED]> wrote:

On Mon, Apr 03, 2006 at 04:57:09PM -0400, Phil Anderson wrote:
> wget can do pretty much anything relating to web crawling, it's got a ton of
> options. it might be kind of annoying, but there should be some tutorials
> out there to simplify what you want to do... it's pretty much the
> swiss-army-knife of web crawling tools.

Dude... I had no idea... I use wget all of time... even in mirror mode
and recursive modes, but apparently it can respect robots.txt and does
rate-limiting both in terms of bandwidth and queries/second which is
really what I wanted. It even lets me add extra header entries in the
get request which I figured I would have to hack in myself...

Muchos gracias,

- Rob
.

--
Christopher Conroy

Re: [UM-LINUX] web crawlers?

Reply via email to