BUG in --timeout (exit status)
Hi, doing the following: # /tmp/wget-1.9-beta3/src/wget -r --timeout=5 --tries=1 http://weather.cod.edu/digatmos/syn/ --11:33:16-- http://weather.cod.edu/digatmos/syn/ = `weather.cod.edu/digatmos/syn/index.html' Resolving weather.cod.edu... 192.203.136.228 Connecting to weather.cod.edu[192.203.136.228]:80... failed: Connection timed out. Giving up. FINISHED --11:33:21-- Downloaded: 0 bytes in 0 files # echo $? 0 If wget aborts because of an timeout (all --*-timeout options), it sets an exit status of 0, which is not what users are expecting, and which makes it very difficult to catch such aborts. Using --non-verbose in this example, I get no response at all that something might have failed. An abort has nothing to do with being verbose or not, it should always be notified in some way, IMHO. Further more, wget man and info pages should document the exit status, I could not find any documentation about wget's exit status. In contrast, curl does the right thing (non-zero exit status): # curl -r --connect-timeout 5 http://weather.cod.edu/digatmos/syn/ curl: (7) # echo $? 7 regards Manfred This message was sent using IMP, the Internet Messaging Program.
Re: downloading files for ftp
Payal Rathod [EMAIL PROTECTED] writes: On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote: The way to do it with Wget would be something like: wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED] But if I run in thru' crontab, where will it store the downloaded files? I want it to store as it is in server 1. It will store them to the current directory. You can either cd to the desired target directory, or use the `-P' flag to specify the directory to Wget.
Re: BUG in --timeout (exit status)
This problem is not specific to timeouts, but to recursive download (-r). When downloading recursively, Wget expects some of the specified downloads to fail and does not propagate that failure to the code that sets the exit status. This unfortunately includes the first download, which should probably be an exception.
Re: BUG in --timeout (exit status)
OK, I see. But I do not agree. And I don't think it is a good idea to treat the first download special. In my opinion, exit status 0 means everything during the whole retrieval went OK. My prefered solution would be to set the final exit status to the highest exit status of all individual downloads. Of course, retries which are triggered by --tries should erase the exit status of the previous attempt. A non-zero exit status does not mean nothing went OK but some individual downloads failed somehow. And setting a non-zero exit status does not mean wget has to stop retrieval immediately, it is OK to continue. Again, wget's behaviour is not what the user expects. And the user has always the possibility to make combinations of --accept, --reject, --domains, etc. so in normal cases all individual downloads succeed, if he needs a exit status 0. If he does not care about exit status, there is no problem at all, of course... regards Manfred Zitat von Hrvoje Niksic [EMAIL PROTECTED]: This problem is not specific to timeouts, but to recursive download (-r). When downloading recursively, Wget expects some of the specified downloads to fail and does not propagate that failure to the code that sets the exit status. This unfortunately includes the first download, which should probably be an exception. This message was sent using IMP, the Internet Messaging Program.
RE: Option to save unfollowed links
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 01, 2003 9:20 PM Tony Lewis [EMAIL PROTECTED] writes: Would something like the following be what you had in mind? 301 http://www.mysite.com/ 200 http://www.mysite.com/index.html 200 http://www.mysite.com/followed.html 401 http://www.mysite.com/needpw.html --- http://www.othersite.com/notfollowed.html Yes, with the possible extensions of file name where the link was saved, sensible status for non-HTTP (currently FTP) links, etc. url which contained the first encountered link to that object, all urls pointing to that page, number of retries used, total time needed, mean donwload bandwidth... lots of interesting data could be logged that way. Collection of desired fields should definitively be configurable at runtime. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
Re: Submitting a `.pot' file to the Translation Project
The home page is back, but it says that the TP Robot is dead. I've contacted Martin Loewis, perhaps he'll be able to provide more info.
Re: downloading files for ftp
On Thu, Oct 02, 2003 at 12:03:34PM +0200, Hrvoje Niksic wrote: Payal Rathod [EMAIL PROTECTED] writes: On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote: The way to do it with Wget would be something like: wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED] But if I run in thru' crontab, where will it store the downloaded files? I want it to store as it is in server 1. It will store them to the current directory. You can either cd to the desired target directory, or use the `-P' flag to specify the directory to Wget. Thanks a lot. It works wonderfully. But one small thing here. I am trying to use it thru' cron like this, 51 * * * * wget --mirror --no-host-directories -P /home/t1 ftp://root:[EMAIL PROTECTED]//home/payal/qmail* But instead of delivering it to /home/t1, wget makes a directory /home/t1/home/payal and put the qmail* files there. What is the workaround for this? Even if I can download the whole /home it is OK. With warm regards, -Payal -- Visit GNU/Linux Success Stories http://payal.staticky.com Guest-Book Section Updated.
Re: downloading files for ftp
Payal Rathod [EMAIL PROTECTED] writes: On Thu, Oct 02, 2003 at 12:03:34PM +0200, Hrvoje Niksic wrote: Payal Rathod [EMAIL PROTECTED] writes: On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote: The way to do it with Wget would be something like: wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED] But if I run in thru' crontab, where will it store the downloaded files? I want it to store as it is in server 1. It will store them to the current directory. You can either cd to the desired target directory, or use the `-P' flag to specify the directory to Wget. Thanks a lot. It works wonderfully. But one small thing here. I am trying to use it thru' cron like this, 51 * * * * wget --mirror --no-host-directories -P /home/t1 ftp://root:[EMAIL PROTECTED]//home/payal/qmail* But instead of delivering it to /home/t1, wget makes a directory /home/t1/home/payal and put the qmail* files there. What is the workaround for this? Use `--cut-dirs=2', which will tell Wget to get rid of two levels of directory hierarchy (home and payal).
run_with_timeout() for Windows
I've patched util.c to make run_with_timeout() work on Windows (better than it does with alarm()!). In short it creates and starts a thread, then loops querying the thread exit-code. breaks if != STLL_ACTIVE, else sleep for 0.1 sec. Uses a wget_timer too for added accuracy. Tested with --dns-timeout, --connect-timeout, gethostbyname() and getaddrinfo(). Built and tested wih MingW/gcc 3,3,1, OpenWatcom 1.1 and DMC 8.36, but not MSVC 6. All seems okay. I have a problem with run_with_timeout() returning 1 and hence lookup_host() reporting ETIMEDOUT. Isn't TRY_AGAIN more suited indicating the caller should try a longer timeout? Patch against beta-2 (I think): --- src/utils.c.orig Sun Sep 21 01:12:18 2003 +++ src/utils.c Thu Oct 02 22:04:01 2003 @@ -1965,12 +1965,141 @@ # endif /* not HAVE_SIGSETJMP */ #endif /* USE_SIGNAL_TIMEOUT */ + +#if defined(WINDOWS) + +/* Wait for thread completion in 0.1s intervals (a tradeoff between + * CPU loading and resolution). + */ +#define THREAD_WAIT_INTV 100 +#define THREAD_STACK_SIZE 4096 + +struct thread_data { + void (*fun) (void *); + void *arg; + DWORD ws_error; +}; + +static DWORD WINAPI +thread_helper (void *arg) +{ + struct thread_data *td = (struct thread_data *) arg; + + WSASetLastError (0); + td-ws_error = 0; + (*td-fun) (td-arg); + + /* Since run_with_timeout() is only used for Winsock functions and + * Winsock errors are per-thread, we must return this to caller. + */ + td-ws_error = WSAGetLastError(); + return (0); +} + +#ifdef GV_DEBUG /* I'll remove this eventually */ +#define DEBUGN(lvl,x) do { if (opt.verbose = (lvl)) DEBUGP (x); } while (0) +#else +#define DEBUGN(lvl,x) ((void)0) +#endif + +/* + * Create a thread for 'fun' to run in. Since call-convention of 'fun' is + * undefined [1], we must call it via thread_helper() which must be __stdcall/WINAPI. + * + * Return -1 if illegal timeout or failed to create thread. + * Return +1 on thread timeout, + * else 0 (okay) + * + * [1] MSVC can use __fastcall globally (cl /Gr) and on Watcom this is the + * default (wcc386 -3r). + */ +static BOOL +spawn_thread (double seconds, void (*fun) (void *), void *arg) +{ + static HANDLE thread_hnd = NULL; + struct thread_data thread_arg; + struct wget_timer *timer; + DWORD thread_id, exitCode; + double elapsed, max_msec; + + DEBUGN (2, (seconds %.2f, , seconds)); + + if (seconds == 0.0) +return (-1); /* run blocking 'fun' */ + + if (seconds 1.0) +seconds = 1.0; + + /* Should never happen, but test for recursivety anyway */ + assert (thread_hnd == NULL); + thread_arg.arg = arg; + thread_arg.fun = fun; + thread_hnd = CreateThread (NULL, THREAD_STACK_SIZE, + thread_helper, (void*)thread_arg, + 0, thread_id); + if (!thread_hnd) + { +DEBUGP ((CreateThread() failed; %s\n, strerror(GetLastError(; +return (-1); + } + + exitCode = STILL_ACTIVE; + max_msec = 1000.0 * seconds; + timer = wtimer_new(); + + /* Sleep() isn't very accurate, so do a double check in the for-loop */ + for (elapsed = 0.0; + elapsed max_msec wtimer_elapsed(timer) max_msec; + elapsed += (double)THREAD_WAIT_INTV) + { +GetExitCodeThread (thread_hnd, exitCode); +DEBUGN (2, (thread exit-code %lu\n, exitCode)); +if (exitCode != STILL_ACTIVE) + break; +Sleep (THREAD_WAIT_INTV); + } + + DEBUGN (2, (elapsed %.2f, wtimer_elapsed %.2f, , elapsed, wtimer_elapsed(timer))); + + wtimer_delete (timer); + + /* If we timed out kill the thread. Normal thread exitCode would be 0. + */ + if (exitCode == STILL_ACTIVE) + { +DEBUGN (2, (thread timed out\n)); +exitCode = 1; +TerminateThread (thread_hnd, exitCode); +WSASetLastError (ETIMEDOUT); /* overridden by caller */ + } + else + { +DEBUGN (2, (thread exit-code %lu, WS error %lu\n, exitCode, thread_arg.ws_error)); +exitCode = 0; +WSASetLastError (thread_arg.ws_error); + } + thread_hnd = NULL; + return (exitCode); +} +#endif /* WINDOWS */ + int run_with_timeout (double timeout, void (*fun) (void *), void *arg) { -#ifndef USE_SIGNAL_TIMEOUT +#if defined(WINDOWS) + int rc = spawn_thread (timeout, fun, arg); + + if (rc 0) + { +fun (arg); +rc = 0; + } + return rc; + +#elif !defined(USE_SIGNAL_TIMEOUT) fun (arg); return 0; + #else int saved_errno; Gisle V. # rm /bin/laden /bin/laden: Not found
Re: run_with_timeout() for Windows
Forgot this in src/Changelog: 2003-10-02 Gisle Vanem [EMAIL PROTECTED] * utils.c (run_with_timeout): For Windows: Run the 'fun' in a thread via a helper function. Continually query the thread's exit-code until finished or timed out. PS.: +static DWORD WINAPI +thread_helper (void *arg) +{ + struct thread_data *td = (struct thread_data *) arg; + + WSASetLastError (0); + td-ws_error = 0; AFAIK, error-codes are inherited from parent-thread, but not conveyed back. That's why I clear it in the new thread. Gisle V. # rm /bin/laden /bin/laden: Not found
Re: run_with_timeout() for Windows
Gisle Vanem [EMAIL PROTECTED] writes: I've patched util.c to make run_with_timeout() work on Windows (better than it does with alarm()!). Cool, thanks! Note that, to save the honor of Unix, I've added support for setitimer on systems that support it (virtually everything these days), so run_with_timeout now always works with sub-second precision. Also, I think the Windows-specific implementation of run_with_timeout should be entirely in mswindows.c. The Unix one in utils.c is enough of a soup to add the Windows version as well. Besides, mswindows.c can freely include all the needed headers, use MSVC++ specific constructs, etc. In short it creates and starts a thread, then loops querying the thread exit-code. breaks if != STLL_ACTIVE, else sleep for 0.1 sec. Uses a wget_timer too for added accuracy. The 0.1s sleeps strike me as inefficient. Couldn't you wait for a condition instead? For example: run_with_timeout(...) { initialize condvar (pthread_cond_init) spawn the thread wait on condvar's condition with specified timeout (pthread_cond_timedwait) kill the thread or not, depending on whether the above wait timed out or not. } thread_helper() { call fun(arg) signal the condvar (pthread_cond_signal) } I have a problem with run_with_timeout() returning 1 and hence lookup_host() reporting ETIMEDOUT. Isn't TRY_AGAIN more suited indicating the caller should try a longer timeout? I'm not sure what you mean here. Isn't the whole point of having a DNS timeout for the program to *not* retry with a longer value, but to give up? Or, do you mean that Wget's *_loop functions should treat host lookup failure due to timeout as non-fatal error? + if (seconds 1.0) +seconds = 1.0; Why is this necessary? The alarm() code was doing something similar, but that was to make sure a 0.5s timeout doesn't end up calling alarm(0), which would mean wait forever. BTW why are you setting the stack size to 4096 (bytes?)? It probably doesn't matter in the current implementation, but it might hurt other uses of run_with_timeout. + /* If we timed out kill the thread. Normal thread exitCode would be 0. + */ + if (exitCode == STILL_ACTIVE) + { +DEBUGN (2, (thread timed out\n)); +exitCode = 1; +TerminateThread (thread_hnd, exitCode); +WSASetLastError (ETIMEDOUT); /* overridden by caller */ Why are you setting the error here? The semantics of run_with_timeout are supposed to be that error conditions are determined by whatever FUN was doing. If some X_with_timeout routine wants to set errno to ETIMEDOUT, it can, but it's not run_with_timeout's job to do that.
Re: run_with_timeout() for Windows
I've committed this patch, with minor changes, such as moving the code to mswindows.c. Since I don't have MSVC, someone else will need to check that the code compiles. Please let me know how it goes.
Re: run_with_timeout() for Windows
Hrvoje Niksic [EMAIL PROTECTED] said: I've committed this patch, with minor changes, such as moving the code to mswindows.c. Since I don't have MSVC, someone else will need to check that the code compiles. Please let me know how it goes. It compiled it with MSVC okay, but crashed somewhere unrelated. Both before and after my patch. --gv