BUG in --timeout (exit status)

2003-10-02 Thread Manfred Schwarb
Hi,

doing the following:
# /tmp/wget-1.9-beta3/src/wget -r --timeout=5 --tries=1
http://weather.cod.edu/digatmos/syn/
--11:33:16--  http://weather.cod.edu/digatmos/syn/
   = `weather.cod.edu/digatmos/syn/index.html'
Resolving weather.cod.edu... 192.203.136.228
Connecting to weather.cod.edu[192.203.136.228]:80... failed: Connection timed
out.
Giving up.


FINISHED --11:33:21--
Downloaded: 0 bytes in 0 files

# echo $?
0

If wget aborts because of an timeout (all --*-timeout options), it sets 
an exit status of 0, which is not what users are expecting,
and which makes it very difficult to catch such aborts.

Using --non-verbose in this example, I get no response at all that
something might have failed. An abort has nothing to do with being
verbose or not, it should always be notified in some way, IMHO. 

Further more, wget man and info pages should document the exit status,
I could not find any documentation about wget's exit status.


In contrast, curl does the right thing (non-zero exit status):
# curl -r --connect-timeout 5 http://weather.cod.edu/digatmos/syn/
curl: (7)
# echo $?
7


regards
Manfred



This message was sent using IMP, the Internet Messaging Program.


Re: downloading files for ftp

2003-10-02 Thread Hrvoje Niksic
Payal Rathod [EMAIL PROTECTED] writes:

 On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote:
 The way to do it with Wget would be something like:
 
 wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED]

 But if I run in thru' crontab, where will it store the downloaded files?
 I want it to store as it is in server 1.

It will store them to the current directory.  You can either cd to the
desired target directory, or use the `-P' flag to specify the
directory to Wget.


Re: BUG in --timeout (exit status)

2003-10-02 Thread Hrvoje Niksic
This problem is not specific to timeouts, but to recursive download (-r).

When downloading recursively, Wget expects some of the specified
downloads to fail and does not propagate that failure to the code that
sets the exit status.  This unfortunately includes the first download,
which should probably be an exception.


Re: BUG in --timeout (exit status)

2003-10-02 Thread Manfred Schwarb
OK, I see.
But I do not agree.
And I don't think it is a good idea to treat the first download special.

In my opinion, exit status 0 means everything during the whole 
retrieval went OK.
My prefered solution would be to set the final exit status to the highest
exit status of all individual downloads. Of course, retries which are 
triggered by --tries should erase the exit status of the previous attempt.
A non-zero exit status does not mean nothing went OK but some individual
downloads failed somehow.
And setting a non-zero exit status does not mean wget has to stop
retrieval immediately, it is OK to continue.

Again, wget's behaviour is not what the user expects.

And the user has always the possibility to make combinations of
--accept, --reject, --domains, etc. so in normal cases all 
individual downloads succeed, if he needs a exit status 0.
If he does not care about exit status, there is no problem at all,
of course...


regards
Manfred


Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 This problem is not specific to timeouts, but to recursive download (-r).
 
 When downloading recursively, Wget expects some of the specified
 downloads to fail and does not propagate that failure to the code that
 sets the exit status.  This unfortunately includes the first download,
 which should probably be an exception.
 




This message was sent using IMP, the Internet Messaging Program.


RE: Option to save unfollowed links

2003-10-02 Thread Herold Heiko
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 01, 2003 9:20 PM
 
 Tony Lewis [EMAIL PROTECTED] writes:
 
  Would something like the following be what you had in mind?
 
  301 http://www.mysite.com/
  200 http://www.mysite.com/index.html
  200 http://www.mysite.com/followed.html
  401 http://www.mysite.com/needpw.html
  --- http://www.othersite.com/notfollowed.html
 
 Yes, with the possible extensions of file name where the link was
 saved, sensible status for non-HTTP (currently FTP) links, etc.
 

url which contained the first encountered link to that object, all urls
pointing to that page, number of retries used, total time needed, mean
donwload bandwidth...
lots of interesting data could be logged that way. Collection of desired
fields should definitively be configurable at runtime.

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax


Re: Submitting a `.pot' file to the Translation Project

2003-10-02 Thread Hrvoje Niksic
The home page is back, but it says that the TP Robot is dead.  I've
contacted Martin Loewis, perhaps he'll be able to provide more info.


Re: downloading files for ftp

2003-10-02 Thread Payal Rathod
On Thu, Oct 02, 2003 at 12:03:34PM +0200, Hrvoje Niksic wrote:
 Payal Rathod [EMAIL PROTECTED] writes:
 
  On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote:
  The way to do it with Wget would be something like:
  
  wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED]
 
  But if I run in thru' crontab, where will it store the downloaded files?
  I want it to store as it is in server 1.
 
 It will store them to the current directory.  You can either cd to the
 desired target directory, or use the `-P' flag to specify the
 directory to Wget.

Thanks a lot. It works wonderfully. But one small thing here. I am
trying to use it thru' cron like this,

51 * * * * wget --mirror --no-host-directories -P /home/t1 ftp://root:[EMAIL 
PROTECTED]//home/payal/qmail*

But instead of delivering it to /home/t1, wget makes a directory
/home/t1/home/payal and put the qmail* files there.

What is the workaround for this?
Even if I can download the whole /home it is OK.

With warm regards,
-Payal

 

-- 
Visit GNU/Linux Success Stories
http://payal.staticky.com
Guest-Book Section Updated.


Re: downloading files for ftp

2003-10-02 Thread Hrvoje Niksic
Payal Rathod [EMAIL PROTECTED] writes:

 On Thu, Oct 02, 2003 at 12:03:34PM +0200, Hrvoje Niksic wrote:
 Payal Rathod [EMAIL PROTECTED] writes:
 
  On Wed, Oct 01, 2003 at 09:26:47PM +0200, Hrvoje Niksic wrote:
  The way to do it with Wget would be something like:
  
  wget --mirror --no-host-directories ftp://username:[EMAIL PROTECTED]
 
  But if I run in thru' crontab, where will it store the downloaded files?
  I want it to store as it is in server 1.
 
 It will store them to the current directory.  You can either cd to the
 desired target directory, or use the `-P' flag to specify the
 directory to Wget.

 Thanks a lot. It works wonderfully. But one small thing here. I am
 trying to use it thru' cron like this,

 51 * * * * wget --mirror --no-host-directories -P /home/t1 ftp://root:[EMAIL 
 PROTECTED]//home/payal/qmail*

 But instead of delivering it to /home/t1, wget makes a directory
 /home/t1/home/payal and put the qmail* files there.

 What is the workaround for this?

Use `--cut-dirs=2', which will tell Wget to get rid of two levels of
directory hierarchy (home and payal).


run_with_timeout() for Windows

2003-10-02 Thread Gisle Vanem
I've patched util.c to make run_with_timeout() work on
Windows (better than it does with alarm()!).

In short it creates and starts a thread, then loops querying 
the thread exit-code. breaks if != STLL_ACTIVE, else sleep
for 0.1 sec. Uses a wget_timer too for added accuracy.

Tested with --dns-timeout, --connect-timeout, gethostbyname()
and getaddrinfo(). Built and tested wih MingW/gcc 3,3,1, OpenWatcom 
1.1 and DMC 8.36, but not MSVC 6. All seems okay. 

I have a problem with run_with_timeout() returning 1 and hence
lookup_host() reporting ETIMEDOUT. Isn't TRY_AGAIN more suited
indicating the caller should try a longer timeout?

Patch against beta-2 (I think):

--- src/utils.c.orig Sun Sep 21 01:12:18 2003
+++ src/utils.c Thu Oct 02 22:04:01 2003
@@ -1965,12 +1965,141 @@
 # endif /* not HAVE_SIGSETJMP */
 #endif /* USE_SIGNAL_TIMEOUT */
 
+
+#if defined(WINDOWS)
+
+/* Wait for thread completion in 0.1s intervals (a tradeoff between 
+ * CPU loading and resolution).
+ */
+#define THREAD_WAIT_INTV   100  
+#define THREAD_STACK_SIZE  4096 
+
+struct thread_data {
+   void (*fun) (void *);
+   void  *arg;
+   DWORD ws_error; 
+};
+
+static DWORD WINAPI 
+thread_helper (void *arg)
+{
+  struct thread_data *td = (struct thread_data *) arg;
+  
+  WSASetLastError (0);
+  td-ws_error = 0;
+  (*td-fun) (td-arg);
+  
+  /* Since run_with_timeout() is only used for Winsock functions and
+   * Winsock errors are per-thread, we must return this to caller.
+   */
+  td-ws_error = WSAGetLastError();
+  return (0); 
+}
+
+#ifdef GV_DEBUG  /* I'll remove this eventually */
+#define DEBUGN(lvl,x)  do { if (opt.verbose = (lvl)) DEBUGP (x); } while (0)
+#else
+#define DEBUGN(lvl,x)  ((void)0)
+#endif  
+
+/*
+ * Create a thread for 'fun' to run in. Since call-convention of 'fun' is
+ * undefined [1], we must call it via thread_helper() which must be __stdcall/WINAPI.
+ *
+ * Return -1 if illegal timeout or failed to create thread.
+ * Return +1 on thread timeout,
+ * else 0 (okay)
+ *
+ * [1] MSVC can use __fastcall globally (cl /Gr) and on Watcom this is the
+ * default (wcc386 -3r). 
+ */
+static BOOL
+spawn_thread (double seconds, void (*fun) (void *), void *arg)
+{
+  static HANDLE thread_hnd = NULL;
+  struct thread_data thread_arg;
+  struct wget_timer *timer;
+  DWORD  thread_id, exitCode;
+  double elapsed, max_msec;
+  
+  DEBUGN (2, (seconds %.2f, , seconds));
+  
+  if (seconds == 0.0)
+return (-1); /* run blocking 'fun' */
+
+  if (seconds  1.0)
+seconds = 1.0;
+   
+  /* Should never happen, but test for recursivety anyway */
+  assert (thread_hnd == NULL);  
+  thread_arg.arg = arg;
+  thread_arg.fun = fun;
+  thread_hnd = CreateThread (NULL, THREAD_STACK_SIZE,
+ thread_helper, (void*)thread_arg, 
+ 0, thread_id); 
+  if (!thread_hnd)
+  {
+DEBUGP ((CreateThread() failed; %s\n, strerror(GetLastError(;
+return (-1);  
+  }
+ 
+  exitCode = STILL_ACTIVE;
+  max_msec = 1000.0 * seconds;
+  timer = wtimer_new();  
+  
+  /* Sleep() isn't very accurate, so do a double check in the for-loop */
+  for (elapsed = 0.0; 
+   elapsed  max_msec  wtimer_elapsed(timer)  max_msec;
+   elapsed += (double)THREAD_WAIT_INTV)
+  {
+GetExitCodeThread (thread_hnd, exitCode);
+DEBUGN (2, (thread exit-code %lu\n, exitCode));
+if (exitCode != STILL_ACTIVE)
+   break;
+Sleep (THREAD_WAIT_INTV);
+  }
+  
+  DEBUGN (2, (elapsed %.2f, wtimer_elapsed %.2f, , elapsed, wtimer_elapsed(timer)));
+  
+  wtimer_delete (timer);
+
+  /* If we timed out kill the thread. Normal thread exitCode would be 0.
+   */
+  if (exitCode == STILL_ACTIVE)
+  {
+DEBUGN (2, (thread timed out\n));
+exitCode = 1;
+TerminateThread (thread_hnd, exitCode);
+WSASetLastError (ETIMEDOUT); /* overridden by caller */
+  }  
+  else
+  {
+DEBUGN (2, (thread exit-code %lu, WS error %lu\n, exitCode, 
thread_arg.ws_error));
+exitCode = 0; 
+WSASetLastError (thread_arg.ws_error);
+  }  
+  thread_hnd = NULL;
+  return (exitCode);
+}
+#endif  /* WINDOWS */
+
 int
 run_with_timeout (double timeout, void (*fun) (void *), void *arg)
 {
-#ifndef USE_SIGNAL_TIMEOUT
+#if defined(WINDOWS)
+  int rc = spawn_thread (timeout, fun, arg);
+  
+  if (rc  0)
+  {
+fun (arg);
+rc = 0;
+  }  
+  return rc;
+  
+#elif !defined(USE_SIGNAL_TIMEOUT)
   fun (arg);
   return 0;
+
 #else
   int saved_errno;



Gisle V.

# rm /bin/laden 
/bin/laden: Not found



Re: run_with_timeout() for Windows

2003-10-02 Thread Gisle Vanem
Forgot this in src/Changelog:

2003-10-02  Gisle Vanem  [EMAIL PROTECTED]

* utils.c (run_with_timeout): For Windows: Run the 'fun' in
  a thread via a helper function. Continually query the 
  thread's exit-code until finished or timed out. 

PS.:

+static DWORD WINAPI 
+thread_helper (void *arg)
+{
+  struct thread_data *td = (struct thread_data *) arg;
+  
+  WSASetLastError (0);
+  td-ws_error = 0;

AFAIK, error-codes are inherited from parent-thread, but
not conveyed back. That's why I clear it in the new thread.

Gisle V.

# rm /bin/laden 
/bin/laden: Not found 



Re: run_with_timeout() for Windows

2003-10-02 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:

 I've patched util.c to make run_with_timeout() work on Windows
 (better than it does with alarm()!).

Cool, thanks!  Note that, to save the honor of Unix, I've added
support for setitimer on systems that support it (virtually everything
these days), so run_with_timeout now always works with sub-second
precision.

Also, I think the Windows-specific implementation of run_with_timeout
should be entirely in mswindows.c.  The Unix one in utils.c is enough
of a soup to add the Windows version as well.  Besides, mswindows.c
can freely include all the needed headers, use MSVC++ specific
constructs, etc.

 In short it creates and starts a thread, then loops querying the
 thread exit-code. breaks if != STLL_ACTIVE, else sleep for 0.1
 sec. Uses a wget_timer too for added accuracy.

The 0.1s sleeps strike me as inefficient.  Couldn't you wait for a
condition instead?  For example:

run_with_timeout(...)
{
  initialize condvar  (pthread_cond_init)
  spawn the thread
  wait on condvar's condition with specified timeout  (pthread_cond_timedwait)
  kill the thread or not, depending on whether the above wait timed
out or not.
}

thread_helper()
{
  call fun(arg)
  signal the condvar  (pthread_cond_signal)
}

 I have a problem with run_with_timeout() returning 1 and hence
 lookup_host() reporting ETIMEDOUT. Isn't TRY_AGAIN more suited
 indicating the caller should try a longer timeout?

I'm not sure what you mean here.  Isn't the whole point of having a
DNS timeout for the program to *not* retry with a longer value, but to
give up?

Or, do you mean that Wget's *_loop functions should treat host lookup
failure due to timeout as non-fatal error?

 +  if (seconds  1.0)
 +seconds = 1.0;

Why is this necessary?  The alarm() code was doing something similar,
but that was to make sure a 0.5s timeout doesn't end up calling
alarm(0), which would mean wait forever.

BTW why are you setting the stack size to 4096 (bytes?)?  It probably
doesn't matter in the current implementation, but it might hurt other
uses of run_with_timeout.

 +  /* If we timed out kill the thread. Normal thread exitCode would be 0.
 +   */
 +  if (exitCode == STILL_ACTIVE)
 +  {
 +DEBUGN (2, (thread timed out\n));
 +exitCode = 1;
 +TerminateThread (thread_hnd, exitCode);
 +WSASetLastError (ETIMEDOUT); /* overridden by caller */

Why are you setting the error here?  The semantics of run_with_timeout
are supposed to be that error conditions are determined by whatever
FUN was doing.  If some X_with_timeout routine wants to set errno to
ETIMEDOUT, it can, but it's not run_with_timeout's job to do that.



Re: run_with_timeout() for Windows

2003-10-02 Thread Hrvoje Niksic
I've committed this patch, with minor changes, such as moving the code
to mswindows.c.  Since I don't have MSVC, someone else will need to
check that the code compiles.  Please let me know how it goes.



Re: run_with_timeout() for Windows

2003-10-02 Thread Gisle Vanem
Hrvoje Niksic [EMAIL PROTECTED] said:

 I've committed this patch, with minor changes, such as moving the code
 to mswindows.c.  Since I don't have MSVC, someone else will need to
 check that the code compiles.  Please let me know how it goes.

It compiled it with MSVC okay, but crashed somewhere
unrelated. Both before and after my patch.

--gv