wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Jochen Roderburg

Hi,

I've found a situation where the new version 1.9beta behaves differently than
earlier version. I'm not sure if this is an corrected error or a new bug, I
personally would prefer the old behaviour.

When I do a recursive download with an accept list like

  wget -r -l1 -nd -A zip http://some.host.com/index.htm

it downloads the index.htm file and all the zip files mentioned therein.
With older versions the start file index.htm itself stays there in the end.

Version 1.9 downloads the index.htm and deletes it immediately with the message 
   

  Removing index.htm since it should be rejected.

The recursion is then done correctly.

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany




some wget patches against beta3

2003-10-03 Thread Arkadiusz Miskiewicz
Hi,

Here is few patches against test3:

http://cvs.pld-linux.org/cgi-bin/cvsweb/SOURCES/wget-ac.patch?rev=1.4
(some autoconf 2.5x things)

http://cvs.pld-linux.org/cgi-bin/cvsweb/SOURCES/wget-pl.patch?rev=1.3
(Polish translation update)

-- 
Arkadiusz MikiewiczCS at FoE, Wroclaw University of Technology
arekm.pld-linux.org AM2-6BONE, 1024/3DB19BBD, arekm(at)ircnet, PLD/Linux



mswindows.h patch

2003-10-03 Thread Gisle Vanem
Regarding my run_with_timeout() patch, I forgot the following 
patch to mswindows.h (which isnt included in util.c).

In my forthcoming patches for IPv6, we need to use the correct 
Winsock headers. To avoid ifdef clutter throughout the .c-files, I've
put them in mswindows.h. So the .c-files should never include it, 
but only need network headers like this:
  #ifndef WINDOWS
  # include sys/socket.h
  # include netdb.h
   ...
  #endif
  #include wget.h  

The above which includes sysdep.h which includes
mswindows.h. 

--- CVS-latest/src/mswindows.h   Tue Sep 30 23:24:36 2003
+++ src/mswindows.h Fri Oct 03 16:57:57 2003
@@ -30,6 +30,37 @@
 #ifndef MSWINDOWS_H
 #define MSWINDOWS_H

+#ifndef WGET_H
+#error Include mswindows.h inside or after wget.h
+#endif
+
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN  /* Prevent inclusion of winsock*.h in windows.h */
+#endif
+
+#include windows.h
+
+/* Use the correct winsock header; ws2tcpip.h includes winsock2.h only on
+ * Watcom/MingW. We cannot use winsock.h for IPv6. Using getaddrinfo() requires
+ * ws2tcpip.h
+ */
+#if defined(ENABLE_IPV6) || defined(HAVE_GETADDRINFO)
+# include winsock2.h
+# include ws2tcpip.h
+#else
+# include winsock.h
+#endif
+
+#ifndef EAI_SYSTEM
+#define EAI_SYSTEM -1   /* value doesn't matter */
+#endif
+
+/* Must include sys/stat.h because of 'stat' define below. */
+#include sys/stat.h
+
+/* Missing in several .c files. Include here. */
+#include io.h
+
 /* Apparently needed for alloca(). */
 #include malloc.h

@@ -81,8 +112,6 @@
 # define mkdir(a, b) mkdir(a)
 #endif /* __BORLANDC__ */

-#include windows.h
-
 /* Declarations of various socket errors: */
@@ -136,5 +164,21 @@
 char *ws_mypath (void);
 void ws_help (const char *);
 void windows_main_junk (int *, char **, char **);
+
+/* Things needed for IPv6; missing in ws2tcpip.h. */
+#ifdef ENABLE_IPV6
+ #ifndef HAVE_NTOP
+  extern const char *inet_ntop (int af, const void *src, char *dst, size_t size);
+ #endif
+ #ifndef HAVE_PTON
+  extern int inet_pton (int af, const char *src, void *dst);
+ #endif
+#endif /* ENABLE_IPV6 */

-

Defining WIN32_LEAN_AND_MEAN also makes it compile much faster.

I think it would be handy to have 'opt.debug' in levels of verbosity. 
I.e. '-dd' gives a more chatty wget. Or should it be '-vv'? I'm a bit 
confused about the distinction between those options. I propose we 
add this macro to wget.h:

# define DEBUGN(level,x)   do { if (opt.debug = (level)) \
 DEBUGP (x); } while (0)

And patch init.c:

@@ -85,6 +85,7 @@
 CMD_DECLARE (cmd_boolean);
 CMD_DECLARE (cmd_bytes);
 CMD_DECLARE (cmd_directory_vector);
+CMD_DECLARE (cmd_increment);
 CMD_DECLARE (cmd_lockable_boolean);
 CMD_DECLARE (cmd_number);
 CMD_DECLARE (cmd_number_inf);
@@ -129,7 +128,7 @@
   { cookies, opt.cookies,   cmd_boolean },
   { cutdirs, opt.cut_dirs,  cmd_number },
 #ifdef DEBUG
-  { debug,   opt.debug, cmd_boolean },
+  { debug,   opt.debug, cmd_increment },
 #endif
   { deleteafter, opt.delete_after,  cmd_boolean },
   { dirprefix,   opt.dir_prefix,cmd_directory },
@@ -632,6 +631,17 @@
 }

   *(int *)closure = bool_value;
+  return 1;
+}
+
+/* Increment a value from VAL to CLOSURE.  COM is ignored,
+   except for error messages.  */
+static int
+cmd_increment (const char *com, const char *val, void *closure)
+{
+  int tmp;
+  if (cmd_boolean(com,val,tmp))
+ (*(int*)closure)++;
   return 1;
 }


Wadda you think? AFAIK only wget.texi should be updated.
Add this to @item -d:
  To get increased verbosity turn up the debug-level
  by repeating this option. E.g. @samp{-dd} or
  @samp{--debug --debug}.

And one last patch (close - CLOSE):

--- CVS-latest/src/connect.c Mon Sep 22 15:55:22 2003
+++ src/connect.c   Thu Oct 02 16:52:33 2003
@@ -37,9 +37,7 @@
 #endif
 #include assert.h

-#ifdef WINDOWS
-# include winsock.h
-#else
+#ifndef WINDOWS
 # include sys/socket.h
 # include netdb.h
 # include netinet/in.h
@@ -201,7 +199,7 @@
   wget_sockaddr_set_address (bsa, ip_default_family, 0, bind_address);
   if (bind (sock, bsa.sa, sockaddr_len ()))
{
- close (sock);
+ CLOSE (sock);
  sock = -1;
  goto out;
}
@@ -211,7 +209,7 @@
   if (connect_with_timeout (sock, sa.sa, sockaddr_len (),
opt.connect_timeout)  0)
 {
-  close (sock);
+  CLOSE (sock);
   sock = -1;
   goto out;
 }
--

--gv




Re: wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Hrvoje Niksic
It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
downloads the HTML files only because it absolutely has to, in order
to recurse through them.  After it finds the links in them, it deletes
them.


Re: some wget patches against beta3

2003-10-03 Thread Hrvoje Niksic
Thanks for the contribution.  Note that a slightly more correct place
to send the patch is the [EMAIL PROTECTED] list, followed by
people with a keener interest in development.

Also, you should send at least a short explanation of what each patch
is supposed to do and why one should apply it.  (Except in the case of
really short, self-explanatory patches, of course.)

As for the Polish translation, translations are normally handled
through the Translation Project.  The TP robot is currently down, but
I assume it will be back up soon, and then we'll submit the POT file
and update the translations /en masse/.


Re: mswindows.h patch

2003-10-03 Thread Hrvoje Niksic
Thanks for the patch, I've now applied it with the following ChangeLog
entry:

2003-10-03  Gisle Vanem  [EMAIL PROTECTED]

* connect.c: And don't include them here.

* mswindows.h: Include winsock headers here.

However, I've postponed applying the part that changes `-d'.  I agree
that `-d' could stand improvement, but let's wait with that until 1.9
is released.


Re: wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Jochen Roderburg
Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
 downloads the HTML files only because it absolutely has to, in order
 to recurse through them.  After it finds the links in them, it deletes
 them.

Hmm, so it has really been an undetected error over all the years ;-) ?

Ok, I see, if adding explicit html im my scripts helps, I like to keep those
files  because they show me the date when the last change has occured in a
directory.

Regards, J.Roderburg






Re: wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Fred Holmes
At 12:05 PM 10/3/2003, Hrvoje Niksic wrote:
It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
downloads the HTML files only because it absolutely has to, in order
to recurse through them.  After it finds the links in them, it deletes
them.
How about a switch to keep the .html file, similar to the -nr switch that 
keeps the .listing file for ftp downloads? 



Re: wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Hrvoje Niksic
Jochen Roderburg [EMAIL PROTECTED] writes:

 Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
 downloads the HTML files only because it absolutely has to, in order
 to recurse through them.  After it finds the links in them, it deletes
 them.

 Hmm, so it has really been an undetected error over all the years
 ;-) ?

s/undetected/unfixed/

At least I've always considered it an error.  I didn't know people
depended on it.