Using these parameters will only slow the process of crawling a site.
The purpose is not only to avoid wasting a web server's resources but
also to quit when it appears the site is taking too long to download.
Since wget doesn't appear to contain any logic for detecting crawler
traps, a timed self-limit will also prevent wget from endlessly running
in circles.
Frank
Post, Mark K wrote:
I think that a combination of --limit-rate and --wait parameters makes
this type of enhancement unnecessary, given that his stated purpose was
to not "hammer" a particular site.
Mark Post
-----Original Message-----
From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 30, 2005 12:02 PM
To: Frank McCown
Cc: [email protected]
Subject: Re: Limit time to run
Frank McCown wrote:
It would be great if wget had a way of limiting the amount of time it
took to run so it won't accidentally hammer on someone's web server
for
an indefinate amount of time. I'm often needing to let a crawler run
for a while on an unknown site, and I have to manually kill wget after
a
few hours if it hasn't finished yet. It would be nice if I could do:
wget --limit-time=120 ...
to make it stop itself after 120 minutes.
Please cc me on any replies.
i don't think we need to add this feature to wget, as it can be achieved
with a shell script that launches wget in background, sleeps for the
given amount of time and then kills the wget process.
however, if there is a general consensus about adding this feature to
wget, i might consider changing my mind.