Using these parameters will only slow the process of crawling a site. The purpose is not only to avoid wasting a web server's resources but also to quit when it appears the site is taking too long to download. Since wget doesn't appear to contain any logic for detecting crawler traps, a timed self-limit will also prevent wget from endlessly running in circles.

Frank


Post, Mark K wrote:
I think that a combination of --limit-rate and --wait parameters makes
this type of enhancement unnecessary, given that his stated purpose was
to not "hammer" a particular site.


Mark Post

-----Original Message-----
From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 12:02 PM
To: Frank McCown
Cc: [email protected]
Subject: Re: Limit time to run


Frank McCown wrote:

It would be great if wget had a way of limiting the amount of time it
took to run so it won't accidentally hammer on someone's web server

for
an indefinate amount of time. I'm often needing to let a crawler run for a while on an unknown site, and I have to manually kill wget after

a
few hours if it hasn't finished yet.  It would be nice if I could do:

wget --limit-time=120 ...

to make it stop itself after 120 minutes.

Please cc me on any replies.


i don't think we need to add this feature to wget, as it can be achieved

with a shell script that launches wget in background, sleeps for the given amount of time and then kills the wget process.

however, if there is a general consensus about adding this feature to wget, i might consider changing my mind.

Reply via email to