Re: [Bug-wget] Fix: Large files in WARC

2012-02-04 Thread Giuseppe Scrivano
Ángel González keis...@gmail.com writes:

 You would also need
  #define _FILE_OFFSET_BITS 64
 but that seems already handled by configure.
 I'm not sure if that would work for 32bit Windows, though.

_FILE_OFFSET_BITS is defined by the AC_SYS_LARGEFILE macro in the
configure.ac file so we haven't to worry about it.

Cheers,
Giuseppe



Re: [Bug-wget] Two fixes: Memory leak with chunked responses / Chunked responses and WARC files

2012-01-28 Thread Giuseppe Scrivano
hey,

thanks for your patches.  I have pushed them.

Cheers,
Giuseppe



Gijs van Tulder gvtul...@gmail.com writes:

 Hi,

 Here are two small patches. I hope they will be useful.

 First, a patch that fixes a memory leak in fd_read_body (src/retr.c)
 and skip_short_body (src/http.c) when it retrieves a response with
 Transfer-Encoding: chunked. Both functions make calls to
 fd_read_line but never free the result.

 Second, a patch to the fd_read_body function that changes the way
 chunked responses are saved in the WARC file. Until now, wget would
 write a de-chunked response to the WARC file, which is wrong: the WARC
 file is supposed to have an exact copy of the HTTP response, so it
 should also include the chunk headers.

 The first patch fixes the memory leaks. The second patch changes
 fd_read_body to save the full, chunked response in the WARC file.

 Regards,

 Gijs



Re: [Bug-wget] Fwd: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits

2012-01-28 Thread Giuseppe Scrivano
hi,

I will take a deeper look after your copyright assignment process is
completed.  As a suggestion for the futur: it will be better if you ask
on the mailing list before start working on a task next time (unless it
is a bug).  Not all new feature requests can be accepted into wget.

Cheers,
Giuseppe



Sasikanth sasikanth@gmail.com writes:

   Modified the calc_rate function to calculate bandwidth in powers of ten 
 (SI-prefix format)
   for --bits option.

  Please review the changes

   Thanks
   Sasi

 -- Forwarded message --
 From: Sasikanth sasikanth@gmail.com
 Date: Wed, Jan 18, 2012 at 5:43 PM
 Subject: Re: [Bug-wget] [PATCH] [wget-bug #33210], Add an option to output 
 bandwidth in bits
 To: Hrvoje Niksic hnik...@xemacs.org
 Cc: bug-wget@gnu.org

 On Sun, Jan 15, 2012 at 8:51 PM, Hrvoje Niksic hnik...@xemacs.org wrote:

 Sasikanth sasikanth@gmail.com writes:

  No one asked. i had just thought it will be good to display all the 
 output
  in either bits or bytes to avoid confusion to the user (I had
  confused).

 I understand that, but I have never seen a downloading agent output data
 length in bits, so displaying the data in bits would likely cause much
 more confusion and/or be less useful.  (Data throughput in bits, on the
 other hand, is quite common.)  With the original implementation of
 --bits I expect that someone would soon ask for
 --bits-for-bandwidth-only.

  Anyhow thanks I will modify the patch.

 Thanks.

 Note that the patch has another problem: while Wget's K, M, and G
 refer to (what is now known as) kibibytes, mebibytes, and gibibytes,
 bandwidth is measured in kilobits, megabits, and gigabits per second.
 Bandwidth units all refer to powers of ten, not to powers of two, so it
 is incorrect for calc_rate to simply increase the byte multipliers by 8.

 Hrvoje

   
   Modified the calc_rate function to calculate bandwidth in powers of ten 
 (SI-prefix format)
   for --bits option.

  Please review the changes

   Thanks
   Sasi


 diff -ur orig/wget-1.13.4/src/init.c wget-1.13.4/src/init.c
 --- orig/wget-1.13.4/src/init.c   2011-08-19 15:36:20.0 +0530
 +++ wget-1.13.4/src/init.c2012-01-18 14:42:56.240973950 +0530
 @@ -126,6 +126,7 @@
{ backups,  opt.backups,   cmd_number },
{ base, opt.base_href, cmd_string },
{ bindaddress,  opt.bind_address,  cmd_string },
 +  { bits, opt.bits_fmt,  cmd_boolean},
  #ifdef HAVE_SSL
{ cacertificate,opt.ca_cert,   cmd_file },
  #endif
 diff -ur orig/wget-1.13.4/src/main.c wget-1.13.4/src/main.c
 --- orig/wget-1.13.4/src/main.c   2011-09-06 19:20:11.0 +0530
 +++ wget-1.13.4/src/main.c2012-01-18 14:42:56.241973599 +0530
 @@ -166,6 +166,7 @@
  { backups, 0, OPT_BOOLEAN, backups, -1 },
  { base, 'B', OPT_VALUE, base, -1 },
  { bind-address, 0, OPT_VALUE, bindaddress, -1 },
 +{ bits, 0, OPT_BOOLEAN, bits, -1 },
  { IF_SSL (ca-certificate), 0, OPT_VALUE, cacertificate, -1 },
  { IF_SSL (ca-directory), 0, OPT_VALUE, cadirectory, -1 },
  { cache, 0, OPT_BOOLEAN, cache, -1 },
 @@ -704,6 +705,11 @@
-np, --no-parent don't ascend to the parent directory.\n),
  \n,
  
 +N_(\
 +Output format:\n),
 +N_(\
 +   --bits  Output bandwidth in bits.\n),
 +\n,
  N_(Mail bug reports and suggestions to bug-wget@gnu.org.\n)
};
  
 diff -ur orig/wget-1.13.4/src/options.h wget-1.13.4/src/options.h
 --- orig/wget-1.13.4/src/options.h2011-08-06 15:54:32.0 +0530
 +++ wget-1.13.4/src/options.h 2012-01-18 14:42:56.247982676 +0530
 @@ -255,6 +255,7 @@
  
bool show_all_dns_entries; /* Show all the DNS entries when resolving a
  name. */
 +  bool bits_fmt;  /*Output bandwidth in bits format*/
  };
  
  extern struct options opt;
 diff -ur orig/wget-1.13.4/src/progress.c wget-1.13.4/src/progress.c
 --- orig/wget-1.13.4/src/progress.c   2011-01-01 17:42:35.0 +0530
 +++ wget-1.13.4/src/progress.c2012-01-18 14:42:56.249098685 +0530
 @@ -861,7 +861,7 @@
struct bar_progress_hist *hist = bp-hist;
  
/* The progress bar should look like this:
 - xx% [=== ] nn,nnn 12.34K/s  eta 36m 51s
 + xx% [=== ] nn,nnn 12.34KB/s  eta 36m 51s
  
   Calculate the geometry.  The idea is to assign as much room as
   possible to the progress bar.  The other idea is to never let
 @@ -873,7 +873,7 @@
   xx%  or 100%  - percentage   - 4 chars
   []  - progress bar decorations - 2 chars
nnn,nnn,nnn- downloaded bytes - 12 chars or very rarely 
 more
 -  12.5K/s- download rate - 8 chars
 +  12.5KB/s- download rate   - 9 chars
 eta 36m 51s   - 

Re: [Bug-wget] Cannot compile current bzr trunk: undefined reference to `gzwrite' / `gzclose' / `gzdopen'

2012-01-11 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 Hi all,

 The attached patch should hopefully fix Evgenii's problem.

 The patch changes the configure script to always use libz, unless it
 is explicitly disabled. In that case, the patch makes sure that the
 WARC functions do not use gzip but write to uncompressed files
 instead.

Thanks for the contribution, the patch looks correct, I am going to
apply it.

Cheers,
Giuseppe



Re: [Bug-wget] Fwd: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits

2012-01-11 Thread Giuseppe Scrivano
Thanks for the patch, except some minor esthetic changes, like an empty
space between the function name and '(', that I can fix before apply it,
the patch seems ok.

Before I can apply it though, you need to get copyright assignments with
the FSF.  I am going to send more information in private to you.

Cheers,
Giuseppe



Sasikanth sasikanth@gmail.com writes:

 Sorry guys In my previous mail I attached  .patch extension file instead of
 .txt extension.
 Now correctly attached

 Thanks
 Sasi

 -- Forwarded message --
 From: Sasikanth sasikanth@gmail.com
 Date: Wed, Jan 11, 2012 at 3:18 PM
 Subject: [PATCH] [wget-bug #33210], Add an option to output bandwidth in
 bits
 To: bug-wget@gnu.org


 Hi all,

 I added a new option --bits as requested in
 https://savannah.gnu.org/bugs/?33210.
This patch will display all data length in bits format for --bits option.
 I had verified it with http and ftp. Please let me know If I missed out
 anything.

Attachments: patch and change log entry file

 Thanks
 Sasi

 diff -ru orig/wget-1.13.4/src/ftp.c wget-1.13.4/src/ftp.c
 --- orig/wget-1.13.4/src/ftp.c2012-01-09 14:06:31.273731044 +0530
 +++ wget-1.13.4/src/ftp.c 2012-01-11 14:05:33.793990983 +0530
 @@ -217,18 +217,18 @@
  static void
  print_length (wgint size, wgint start, bool authoritative)
  {
 -  logprintf (LOG_VERBOSE, _(Length: %s), number_to_static_string (size));
 +  logprintf (LOG_VERBOSE, _(Length: %s), number_to_static_string 
 (convert_to_bits(size)));
if (size = 1024)
 -logprintf (LOG_VERBOSE,  (%s), human_readable (size));
 +logprintf (LOG_VERBOSE,  (%s), human_readable (convert_to_bits(size)));
if (start  0)
  {
if (size - start = 1024)
  logprintf (LOG_VERBOSE, _(, %s (%s) remaining),
 -   number_to_static_string (size - start),
 -   human_readable (size - start));
 +   number_to_static_string (convert_to_bits(size - start)),
 +   human_readable (convert_to_bits(size - start)));
else
  logprintf (LOG_VERBOSE, _(, %s remaining),
 -   number_to_static_string (size - start));
 +   number_to_static_string (convert_to_bits(size - start)));
  }
logputs (LOG_VERBOSE, !authoritative ? _( (unauthoritative)\n) : \n);
  }
 @@ -1564,7 +1564,7 @@
   : _(%s (%s) - %s saved [%s]\n\n),
   tms, tmrate,
   write_to_stdout ?  : quote (locf),
 - number_to_static_string (qtyread));
 + number_to_static_string (convert_to_bits(qtyread)));
  }
if (!opt.verbose  !opt.quiet)
  {
 @@ -1573,7 +1573,7 @@
   time. */
char *hurl = url_string (u, URL_AUTH_HIDE_PASSWD);
logprintf (LOG_NONVERBOSE, %s URL: %s [%s] - \%s\ [%d]\n,
 - tms, hurl, number_to_static_string (qtyread), locf, 
 count);
 + tms, hurl, number_to_static_string 
 (convert_to_bits(qtyread)), locf, count);
xfree (hurl);
  }
  
 @@ -1792,7 +1792,7 @@
/* Sizes do not match */
logprintf (LOG_VERBOSE, _(\
  The sizes do not match (local %s) -- retrieving.\n\n),
 - number_to_static_string (local_size));
 + number_to_static_string 
 (convert_to_bits(local_size)));
  }
  }
  }   /* opt.timestamping  f-type == FT_PLAINFILE */
 @@ -2206,7 +2206,7 @@
  sz = -1;
logprintf (LOG_NOTQUIET,
   _(Wrote HTML-ized index to %s [%s].\n),
 - quote (filename), number_to_static_string 
 (sz));
 + quote (filename), number_to_static_string 
 (convert_to_bits(sz)));
  }
else
  logprintf (LOG_NOTQUIET,
 diff -ru orig/wget-1.13.4/src/http.c wget-1.13.4/src/http.c
 --- orig/wget-1.13.4/src/http.c   2012-01-09 14:06:31.274730346 +0530
 +++ wget-1.13.4/src/http.c2012-01-11 14:24:02.721099726 +0530
 @@ -2423,19 +2423,19 @@
logputs (LOG_VERBOSE, _(Length: ));
if (contlen != -1)
  {
 -  logputs (LOG_VERBOSE, number_to_static_string (contlen + 
 contrange));
 +  logputs (LOG_VERBOSE, number_to_static_string (convert_to_bits 
 (contlen) + contrange));
if (contlen + contrange = 1024)
  logprintf (LOG_VERBOSE,  (%s),
 -   human_readable (contlen + contrange));
 +   human_readable (convert_to_bits(contlen) + 
 contrange));
if (contrange)
  {
if (contlen = 1024)
  logprintf (LOG_VERBOSE, _(, %s (%s) remaining),
 -

Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not formatted..

2012-01-07 Thread Giuseppe Scrivano

Micah Cowan mi...@micah.cowan.name writes:

 I believe hh's suggestion is to have the format reflect the way it would look 
 in a URL; so [ and ] around ipv6, and nothing around ipv4 (since ipv4 format 
 isn't ambiguous in the way ipv6 is).

I agree.  Please rework your patch to use [address]:port just for IPv6.

This message should be fixed as well Reusing existing connection to
ADDRESS:IP..

Please also provide a ChangeLog file entry.

Thanks for your contribution!
Giuseppe



Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not formatted..

2012-01-07 Thread Giuseppe Scrivano
thanks.  The patch is not complete yet, it doesn't fix the other message
I have reported before.  Can you please check it as well?  Can you
provide a ChangeLog file entry?

Cheers,
Giuseppe



Sasikanth sasikanth@gmail.com writes:

 I had modified the patch as you guys suggested.
 For ipv6 the display will be [ipv6address]:port
 for ipv4   ipv4address:port

 The test results

 IPv4
 ---

 [root@Shash wget-1.13.4]# ./src/wget http://10.0.0.1
 --2012-01-07 11:01:23--  http://10.0.0.1/
 Connecting to 10.0.0.1:80...

 IPv6
 ---
 [root@Shash wget-1.13.4]# ./src/wget http://[3ffe:b80:17e2::1]
 --2012-01-07 11:01:06--  http://[3ffe:b80:17e2::1]/
 Connecting to [3ffe:b80:17e2::1]:80

 Thanks
 Sasi

 On Sat, Jan 7, 2012 at 3:14 AM, Henrik Holst
 henrik.ho...@millistream.comwrote:

 Exactly! That is how atleast I have akways seen address and port
 combinations been presented (or entered).

 /hh
 Den 6 jan 2012 21:27 skrev Micah Cowan mi...@micah.cowan.name:

 I believe hh's suggestion is to have the format reflect the way it would
 look in a URL; so [ and ] around ipv6, and nothing around ipv4 (since ipv4
 format isn't ambiguous in the way ipv6 is).

 (Sent by my Kindle Fire)
 -mjc
 Sent from my Kindle Fire

 --
 *From:* Sasikanth sasikanth@gmail.com
 *Sent:* Fri Jan 06 01:56:34 PST 2012
 *To:* henrik.ho...@millistream.com
 *Cc:* bug-wget@gnu.org
 *Subject:* Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not
 formatted..

 Currently we are not checking family type of the address before printing
 the message.

 Do we have to print the message as [3ffe:b80:17e2::1]:80 for ipv6 and

 |10.0.0.1|:80 for ipv4?

 Please confirm so I will resubmit patch.

 Thanks
 Sasi


 Note: I didn't get the reply to my mail, to keep track the discussion I

 had copied the mail content from the mailing list.

 Shouldn't IPv6 addresses be displayed like this instead:
 [3ffe:b80:17e2::1]:80

 /hh
 Den 5 jan 2012 14:15 skrev Sasikanth address@hidden:

   Hi,
 
   This very small change related to display issue.
   The  bug id is 32357
 https://savannah.gnu.org/bugs/index.php?32357;;

 
   When we run wget
   with
 an ip address alone  (wget 10.0.0.1 or wget
  http://10.0.0.1/ or wget http://[3ffe:b80:17e2::1])
   the display shows as

 
   IPV4
   Connecting to 10.0.0.1:80...
   IPV6
   Connecting to 3ffe:b80:17e2::1:80   (Because of IPV6 format (ff::01)
 it is
  little hard differentiate the ipv6 address and port number)

 
   This patch will show the display
 
   IPV4
   Connecting to |10.0.0.1|:80...
   IPV6
   Connecting to |3ffe:b80:17e2::1|:80
 

 
  Thanks
  Sasi



 --- src/connect.c.orig2012-01-07 09:39:55.965324001 +0530
 +++ src/connect.c 2012-01-07 10:54:08.295324000 +0530
 @@ -293,7 +293,12 @@
 xfree (str);
  }
else
 -logprintf (LOG_VERBOSE, _(Connecting to %s:%d... ), txt_addr, 
 port);
 +   {
 +   if (ip-family == AF_INET)
 +   logprintf (LOG_VERBOSE, _(Connecting to %s:%d... ), 
 txt_addr, port);
 +   else if (ip-family == AF_INET6)
 +   logprintf (LOG_VERBOSE, _(Connecting to [%s]:%d... ), 
 txt_addr, port);
 +   }
  }
  
/* Store the sockaddr info to SA.  */



Re: [Bug-wget] feature suggestion: host spanning depth limit (absolute)

2012-01-04 Thread Giuseppe Scrivano
Naxa anaxagra...@gmail.com writes:

 I suggest a feature for limiting the recursion depth level specifically
 on different Hosts, when spanning hosts.
 This way I wouldn't need to know and list the different hosts when,
 for example, a page links to multiple image hosting sites.
 An option like `-H 1` would then limit host spanning the same way that
 `--limit` works for all recursion.  Would count the needed spanning
 steps from the original domain as the distance.

it is something that can be implemented without change the current
semantic of -H.  Feel free to submit a patch :-)

Thanks,
Giuseppe



Re: [Bug-wget] Wget 1.13.4 test suite on Windows/MinGW

2011-12-29 Thread Giuseppe Scrivano
Eli Zaretskii e...@gnu.org writes:

 Sorry, I don't understand this comment.  fd is indeed a file
 descriptor, but ioctlsocket's first argument is a SOCKET object, which
 is an unsigned int, and we get it from a call to `socket' or some
 such.  So where do you see a potential problem?

 And anyway, I think wget calls ioctlsocket for every connection; if
 so, then most of those calls succeed, because the binary I built works
 and is quite capably of fetching via HTTP.  So these problems seem to
 be triggered by something specific in those 3 tests.

sorry I wasn't clear.  We use gnulib replacements for socket functions
so internally wget knows only about file descriptors.  On Windows this
abstraction is obtained trough _open_osfhandle on a SOCKET object.
When we use a native function, like ioctlsocket, we have to be sure
the file descriptor is converted back to a SOCKET object (by using
_get_osfhandle).  I am afraid this conversion is not done correctly, the
value you have observed (fd = 3) let me think so.

The w32sock.h file from gnulib defines these two macros for such
conversions:

#define FD_TO_SOCKET(fd)   ((SOCKET) _get_osfhandle ((fd)))
#define SOCKET_TO_FD(fh)   (_open_osfhandle ((long) (fh), O_RDWR | O_BINARY))

Cheers,
Giuseppe



Re: [Bug-wget] empty VERSION in 1.13.4

2011-12-12 Thread Giuseppe Scrivano
Elan Ruusamäe g...@pld-linux.org writes:

 hi

 i reported on irc, but apparently nobody listens there:

 Day changed to 09 Dec  2011
 22:07:16  glen 1.13.4 tarball is buggy. builds from it lack version
 id in user agent header
 22:07:36  glen $ wget -q -O -  ifconfig.me/ua
 22:07:44  glen this prints: Wget/ (linux-gnu)
 22:08:03  glen seems the problem is that tarball does not contain
 this file: sh: build-aux/bzr-version-gen: not found
 22:08:12  glen so regenerating autoconf creates empty @VERSION@
 22:08:28  glen $ grep version.string src/version.c
 22:08:28  glen const char *version_string = ;

 in pld linux i workarounded:
 http://cvs.pld-linux.org/cgi-bin/viewvc.cgi/cvs/packages/wget/wget.spec?r1=1.158r2=1.159
 http://cvs.pld-linux.org/cgi-bin/viewvc.cgi/cvs/packages/wget/wget.spec?r1=1.158r2=1.159

Thanks for the report, this one-line patch should fix the problem:

Cheers,
Giuseppe



=== modified file 'ChangeLog'
--- ChangeLog   2011-12-11 14:18:11 +
+++ ChangeLog   2011-12-12 20:24:25 +
@@ -1,3 +1,8 @@
+2011-12-12  Giuseppe Scrivano  gscriv...@gnu.org
+
+   * Makefile.am (EXTRA_DIST): Add build-aux/bzr-version-gen.
+   Reported by: Elan Ruusamäe g...@pld-linux.org.
+
 2011-12-11  Giuseppe Scrivano  gscriv...@gnu.org
 
* util/trunc.c (main): Call `close' on the fd and check for errors.

=== modified file 'Makefile.am'
--- Makefile.am 2011-01-01 12:19:37 +
+++ Makefile.am 2011-12-12 20:14:16 +
@@ -46,7 +46,7 @@
 EXTRA_DIST = ChangeLog.README MAILING-LIST \
  msdos/ChangeLog msdos/config.h msdos/Makefile.DJ \
  msdos/Makefile.WC ABOUT-NLS \
- build-aux/build_info.pl .version
+ build-aux/build_info.pl build-aux/bzr-version-gen .version
 
 CLEANFILES = *~ *.bak $(DISTNAME).tar.gz
 



Re: [Bug-wget] Disable progress display when log output to file?

2011-12-06 Thread Giuseppe Scrivano
Paul Wratt paul.wr...@gmail.com writes:

 this works but no size in output:
 wget -nv --output-file=wget.txt _url_

 I found a reference to a 2007 post asking for:
  3) add support for turning off the progress bar with
 --progress=none

I think I am going to add this support by myself.
I have written a small patch to make wget parallel, but until I haven't
a clear idea how the progress bar should look (and how to implement it),
a ---progress=none will be fine.

Giuseppe



Re: [Bug-wget] error message

2011-12-06 Thread Giuseppe Scrivano
david painter ddpain...@bigpond.com writes:

 Help.  after installing and trying get my DVD and Cd drives to work I
 now have a error message stating E:Type '2011-12-04' is not known on
 line 1 in Source list/etc/apt/source.list.d/medibuntu.list

you have reached the GNU wget mailing list.  Your problem doesn't seem
related to wget, at least from the information you have provided.
I think you will have more possibilities to find some help writing to an
Ubuntu related mailing list.  Ensure to provide more information (what
you were trying to do, what system you are using...), messages like
yours are often ignored.

Giuseppe



Re: [Bug-wget] --page-requisites and robot exclusion issue

2011-12-05 Thread Giuseppe Scrivano
Paul Wratt paul.wr...@gmail.com writes:

 if it does not obey - server admins will ban it

 the work around:
 1) get single html file first - edit out meta tag - re-get with
 --no-clobber (usually only in landing pages)
 2) empty robots.txt (or allow all - search net)

 possible solutions:
 A) command line option
 B) ./configure --disable-robots-check

you can specify -e robots=off to wget at runtime.

Giuseppe



Re: [Bug-wget] Bug or feature: --continue and --content-disposition don't work together

2011-12-03 Thread Giuseppe Scrivano
hello Alex,

sorry for the late reply.  Correct, when you specify
--content-disposition, the destination file name is not known.  You can
see it by specifying the destination file using -O, as:

wget -c --content-disposition --debug
http://www.dubovskoy.net/CANTER/01.mp3 -O 01.mp3

that command is pretty unuseful though, just skip --content-disposition.

Cheers,
Giuseppe



Alex gnfa...@rambler.ru writes:

 Greetings
 Sorry for bad English
 If --content-disposition enabled, then --continue will not work.
 Example:
 wget -c --content-disposition --debug
 http://www.dubovskoy.net/CANTER/01.mp3
 Every time made request  without range field (Range: bytes=57791-),
 and recieve  HTTP/1.1 200 OK
 wget -c --debug http://www.dubovskoy.net/CANTER/01.mp3
 Made request with range field and recieve HTTP/1.1 206 Partial Content


 wget -c --content-disposition --debug
 http://www.dubovskoy.net/CANTER/01.mp3
 DEBUG output created by Wget 1.13.4 on mingw32.

 URI encoding = `ASCII'
 --2011-11-07 09:13:25--  http://www.dubovskoy.net/CANTER/01.mp3
 Resolving www.dubovskoy.net (www.dubovskoy.net)... seconds 0,00,
 93.180.40.15
 Caching www.dubovskoy.net = 93.180.40.15
 Connecting to www.dubovskoy.net
 (www.dubovskoy.net)|93.180.40.15|:80... seconds 0,00, connected.
 Created socket 4.
 Releasing 0x00caa008 (new refcount 1).

 ---request begin---
 GET /CANTER/01.mp3 HTTP/1.1
 User-Agent: Wget/1.13.4 (mingw32)
 Accept: */*
 Host: www.dubovskoy.net
 Connection: Keep-Alive

 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 200 OK
 Content-Length: 2221830
 Content-Type: audio/mpeg
 Last-Modified: Mon, 25 Jan 2010 16:24:03 GMT
 Accept-Ranges: bytes
 ETag: b2a3caceda9dca1:329
 Server: Microsoft-IIS/6.0
 MicrosoftOfficeWebServer: 5.0_Pub
 X-Powered-By: ASP.NET
 Date: Mon, 07 Nov 2011 07:14:14 GMT

 ---response end---
 200 OK
 Registered socket 4 for persistent reuse.
 Length: 2221830 (2,1M) [audio/mpeg]
 Saving to: `01.mp3'

  0K .. .. .. .. ..  2%
 25,2K 84s
 50K ..



 wget -c --debug http://www.dubovskoy.net/CANTER/01.mp3
 DEBUG output created by Wget 1.13.4 on mingw32.

 URI encoding = `ASCII'
 --2011-11-07 09:18:06--  http://www.dubovskoy.net/CANTER/01.mp3
 Resolving www.dubovskoy.net (www.dubovskoy.net)... seconds 0,00,
 93.180.40.15
 Caching www.dubovskoy.net = 93.180.40.15
 Connecting to www.dubovskoy.net
 (www.dubovskoy.net)|93.180.40.15|:80... seconds 0,00, connected.
 Created socket 4.
 Releasing 0x00cfa008 (new refcount 1).

 ---request begin---
 GET /CANTER/01.mp3 HTTP/1.1
 Range: bytes=57791-
 User-Agent: Wget/1.13.4 (mingw32)
 Accept: */*
 Host: www.dubovskoy.net
 Connection: Keep-Alive

 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 206 Partial Content
 Content-Length: 2164039
 Content-Type: audio/mpeg
 Content-Range: bytes 57791-2221829/2221830
 Last-Modified: Mon, 25 Jan 2010 16:24:03 GMT
 Accept-Ranges: bytes
 ETag: b2a3caceda9dca1:329
 Server: Microsoft-IIS/6.0
 MicrosoftOfficeWebServer: 5.0_Pub
 X-Powered-By: ASP.NET
 Date: Mon, 07 Nov 2011 07:18:56 GMT

 ---response end---
 206 Partial Content
 Registered socket 4 for persistent reuse.
 Length: 2221830 (2,1M), 2164039 (2,1M) remaining [audio/mpeg]
 Saving to: `01.mp3'

 [ skipping 50K ]
 50K ,, .. 



Re: [Bug-wget] Missing gnulib files in development version

2011-11-20 Thread Giuseppe Scrivano
Jochen Roderburg roderb...@uni-koeln.de writes:

 I have some problems compiling recent development versions (with the
 WARC additions) on my Linux.

 First it was missing a tmpdir.h. Looking around I saw some tmpdir
 files in the gnulib directories, but obviously they were not where the
 build process was looking for them. I tried a rerun of the bootstrap
 script which updated a lot of gnulib stuff and now the tmpdir.h was
 found.

 Next it was missing a base32.h. A base32 is also listed in the
 bootstrap.conf, but base32 files did not show up despite repeated
 reruns of the bootstrap script.

 What to try next ??

something is going wrong with the bootstrap script.  Can you please
include what the bootstrap script prints?  Do you get any error?  Does
it happen from a clean checkout too?

Usually I keep the gnulib development tree in a different directory then
specify --gnulib-srcdir=/path/to/gnulib to the bootstrap script, it
saves some time and bandwidth.

Cheers,
Giuseppe



Re: [Bug-wget] Trouble saving the graphs on a page

2011-11-13 Thread Giuseppe Scrivano
what happens if you specify -H?

Cheers,
Giuseppe



Randy Kramer rhkra...@gmail.com writes:

 I just joined the list and I'm jumping the gun a little bit (because I 
 usually 
 lurk on a list for a little while before posting), but...

 I'm trying to save a local copy of this page with all the graphs:

 http://www.businessinsider.com/what-wall-street-protesters-are-so-angry-about-2011-10?op=1
  

 After finally finding the wget manual and the examples there, I thought I 
 found the right command--I tried:

 wget -p --convert-links -nH -nd -Pdownload 
 http://www.businessinsider.com/what-wall-street-protesters-are-so-angry-about-2011-10?op=1
   

 That saves the page, but not the graphs.  Can anybody give me a clue as to 
 what I need to do to also save the graphs?

 Thanks!
 Randy Kramer 



Re: [Bug-wget] wget doesnt work but curl works !

2011-11-13 Thread Giuseppe Scrivano
hi Vishwanath,

is it possible to use the last released version of wget?  You can find
it here: ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz

I have no clue what changes are contained in the so called Red Hat
modified version of wget, I highly suggest to use the last upstream
version in order to report bug instead of a modified version.

What system are you using?

Thanks,
Giuseppe


Vishwanath Reddy Beemidi bvishwana...@gmail.com writes:

 Hi,

 I have trouble getting wget to work when downloading a file using http.curl
 works fine for the same URL.

 Both the commands are being run from the same server at the command line, OS
 : RH Linux mdc1pr012 2.6.18-238.9.1.el5

 Following are the commands and the debug messages, any insights into what
 the problem could be are appreciated.

 [dsop@mdc1pr012]$ wget -d -S http://www.preprod.abc.com/tools/90067660.csv

 Setting --server-response (serverresponse) to 1

 DEBUG output created by Wget 1.11.4 Red Hat modified on linux-gnu.


 --2011-10-14 18:00:58--  http://www.preprod.abc.com/tools/90067660.csv

 Resolving www.preprod.abc.com... 184.31.131.61

 Caching www.preprod.abc.com = 184.31.131.61

 Connecting to www.preprod.abc.com|184.31.131.61|:80... connected.

 Created socket 3.

 Releasing 0x07683bc0 (new refcount 1).


 ---request begin---

 GET /tools/90067660.csv HTTP/1.0

 User-Agent: Wget/1.11.4 Red Hat modified

 Accept: */*

 Host: www.preprod.abc.com

 Connection: Keep-Alive


 ---request end---

 HTTP request sent, awaiting response...

 ---response begin---

 HTTP/1.1 302 Found

 Location:
 http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=

 Cache-Control: no-cache

 Pragma: no-cache

 Content-Type: text/html; charset=utf-8

 Connection: close

 Content-Length: 925


 ---response end---


   HTTP/1.1 302 Found

   Location:
 http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=

   Cache-Control: no-cache

   Pragma: no-cache

   Content-Type: text/html; charset=utf-8

   Connection: close

   Content-Length: 925

 Location:
 http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=[following]

 Closed fd 3

 --2011-10-14 18:00:58--
 http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=

 Resolving fd000xnchegrn02... 11.48.43.72

 Caching fd000xnchegrn02 = 11.48.43.72

 Connecting to fd000xnchegrn02|11.48.43.72|:80... connected.

 Created socket 3.

 Releasing 0x076806e0 (new refcount 1).


 ---request begin---

 GET
 /?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=
 HTTP/1.0

 User-Agent: Wget/1.11.4 Red Hat modified

 Accept: */*

 Host: fd000xnchegrn02

 Connection: Keep-Alive


 ---request end---

 HTTP request sent, awaiting response...

 ---response begin---

 HTTP/1.1 401 Unauthorized

 Cache-Control: no-cache

 Pragma: no-cache

 WWW-Authenticate: NTLM

 WWW-Authenticate: BASIC realm=Federated_Department_Stores

 Content-Type: text/html; charset=utf-8

 Proxy-Connection: close

 Set-Cookie: BCSI-CS-3d1fe99b15515258=2; Path=/

 Connection: close

 Content-Length: 1114


 ---response end---


   HTTP/1.1 401 Unauthorized

   Cache-Control: no-cache

   Pragma: no-cache

   WWW-Authenticate: NTLM

   WWW-Authenticate: BASIC realm=Federated_Department_Stores

   Content-Type: text/html; charset=utf-8

   Proxy-Connection: close

   Set-Cookie: BCSI-CS-3d1fe99b15515258=2; Path=/

   Connection: close

   Content-Length: 1114

 Closed fd 3

 Authorization failed.


 Following is the curl trace info for the same URL

 [dsop@mdc1pr012 ~]$ curl --trace-ascii tr.out -o out.dat
 http://www.preprodmacys.fds.com/tools/90067660.csv
   % Total% Received % Xferd  Average Speed   TimeTime Time
  Current
  Dload  Upload   Total   SpentLeft
  Speed
 100  155k  100  155k0 0   449k  0 --:--:-- --:--:-- --:--:--
  550k

 [dsop@mdc1pr012 ~]$ more tr.out

 == Info: About to connect() to www.preprod.abc.com port 80

 == Info:   Trying 184.31.131.61... == Info: connected

 == Info: Connected to www.preprod.abc.com (184.31.131.61) port 80

 = Send header, 186 bytes (0xba)

 : GET /tools/90067660.csv HTTP/1.1

 0022: User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5

 0062:  OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5

 008b: Host: www.preprod.abc.com

 00ab: Accept: */*

 00b8:

 = Recv header, 17 bytes (0x11)

 : HTTP/1.1 200 OK

 = Recv header, 25 bytes (0x19)

 : Server: IBM_HTTP_Server

 = Recv header, 46 bytes (0x2e)

 : Last-Modified: Fri, 18 Mar 2011 22:42:46 GMT

 = Recv header, 30 bytes (0x1e)

 : ETag: b19b9-26f41-7f2b4580

 = Recv header, 22 bytes (0x16)

 : Accept-Ranges: bytes

 = Recv header, 26 bytes (0x1a)

 : Content-Type: text/plain

 = Recv header, 37 bytes (0x25)

 : Date: Fri, 14 Oct 2011 22:06:15 GMT

 = Recv header, 24 bytes 

Re: [Bug-wget] WARC, new version

2011-11-05 Thread Giuseppe Scrivano
Hey Gijs,

I have added a ChangeLog entry and pushed the change.

Thanks!
Giuseppe



Gijs van Tulder gvtul...@gmail.com writes:

 lovely.  I am going to push it soon with some small adjustments.

 That's good to hear.

 There's one other small adjustment that you may want to make, see the
 attached patch. One of the WARC functions uses the basename function,
 which causes problems on OS X. Including libgen.h and strdup-ing the
 output of basename seems to solve this problem.

 Thanks,

 Gijs


 Op 04-11-11 22:27 schreef Giuseppe Scrivano:
 Gijs van Tuldergvtul...@gmail.com  writes:

 Hi Giuseppe,

 * I've changed the configure.ac and src/Makefile.am.
 * I've added a ChangeLog entry.

 lovely.  I am going to push it soon with some small adjustments.

 Thanks for the great work.  Whenever it happens to be in the same place,
 I'll buy you a beer :-)

 Cheers,
 Giuseppe



Re: [Bug-wget] WARC, new version

2011-11-04 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 Hi Giuseppe,

 * I've changed the configure.ac and src/Makefile.am.
 * I've added a ChangeLog entry.

lovely.  I am going to push it soon with some small adjustments.

Thanks for the great work.  Whenever it happens to be in the same place,
I'll buy you a beer :-)

Cheers,
Giuseppe



Re: [Bug-wget] Memory leak when using GnuTLS

2011-11-04 Thread Giuseppe Scrivano
Committed with a ChangeLog entry and a small change.  Another beer? :-)

Thanks!
Giuseppe



Gijs van Tulder gvtul...@gmail.com writes:

 Hi,

 I think there is a memory leak in the GnuTLS part of wget. When
 downloading multiple files from a HTTPS server, wget with GnuTLS uses
 a lot of memory.

 Perhaps an explanation for this can be found in src/http.c. The
 gethttp calls ssl_init for each download:

 /* Initialize the SSL context.  After this has once been done,
it becomes a no-op.  */
 if (!ssl_init ())

 The OpenSSL version of ssl_init, in src/openssl.c, checks if SSL has
 already been initialized and doesn't repeat the work.

 But the GnuTLS version doesn't:

 bool
 ssl_init ()
 {
   const char *ca_directory;
   DIR *dir;

   gnutls_global_init ();
   gnutls_certificate_allocate_credentials (credentials);

 GnuTLS is initialized again and again, but there is never a call to
 gnutls_global_deinit.

 I've attached a small patch to add a check to ssl_init in
 src/gnutls.c, similar to the check already in src/openssl.c. With it,
 wget can still download over HTTPS and the memory usage stays within
 reasonable limits.

 Thanks,

 Gijs



Re: [Bug-wget] WARC, new version

2011-10-30 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 === modified file 'bootstrap.conf'
 --- bootstrap.conf2011-08-11 12:23:39 +
 +++ bootstrap.conf2011-10-21 19:24:18 +
 @@ -28,6 +28,7 @@
  accept
  alloca
  announce-gen
 +base32
  bind
  c-ctype
  clock-time
 @@ -49,6 +50,7 @@
  mbtowc
  mkdir
  crypto/md5
 +crypto/sha1
  pipe
  quote
  quotearg
 @@ -63,6 +65,7 @@
  stdbool
  strcasestr
  strerror_r-posix
 +tmpdir
  unlocked-io
  update-copyright
  vasprintf

 === modified file 'configure.ac'
 --- configure.ac  2011-09-04 12:19:12 +
 +++ configure.ac  2011-10-23 21:21:49 +
 @@ -511,7 +511,22 @@
fi
  fi
  
 -
 +# Warc
 +AC_CHECK_HEADER(uuid/uuid.h, UUID_FOUND=yes, UUID_FOUND=no)
 +if test x$UUID_FOUND = xno; then
 +  AC_MSG_ERROR([libuuid is required])
 +fi
 +
 +AC_CHECK_LIB(uuid, uuid_generate, UUID_FOUND=yes, UUID_FOUND=no)
 +if test x$UUID_FOUND = xno; then
 +  AC_MSG_ERROR([libuuid is required])
 +fi
 +LIBUUID=-luuid
 +AC_SUBST(LIBUUID)
 +LDFLAGS=${LDFLAGS} -L$libuuid/lib
 +CPPFLAGS=${CPPFLAGS} -I$libuuid/include

I think we shouldn't change the value of LDFLAGS and CPPFLAGS as they
are user variables.  Also, where is $libuuid defined?  We can just drop
these lines.



if (hs-res = 0)
  CLOSE_FINISH (sock);
else
 -{
 -  if (hs-res  0)
 -hs-rderrmsg = xstrdup (fd_errstr (sock));
 -  CLOSE_INVALIDATE (sock);
 -}
 +CLOSE_INVALIDATE (sock);

Why?


The rest seems ok, if you also provide a ChangeLog I can proceed to
merge it.

Thanks,
Giuseppe



Re: [Bug-wget] parallel wget...

2011-10-24 Thread Giuseppe Scrivano
Hrvoje Niksic hnik...@xemacs.org writes:

 I expect the biggest changes to be required in progress.c. :)

anyone has some ideas? :-)  How should it look?

Cheers,
Giuseppe



Re: [Bug-wget] WARC, new version

2011-10-23 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 Hi all,

 Based on the comments by Giuseppe and Ángel I've revised the
 implementation of the wget WARC extenstion. I've attached a patch.

 1. It's no longer based on the warctools library. Instead, I've
 written a couple of new WARC-writing functions, using zlib for the
 gzip compression. The new implementation is much smaller.

 2. I extracted a small part of the gethttp method in http.c and moved
 it to a new function, read_response_body, which is responsible for
 downloading the response body and writing it to a file.

 The WARC extension needs to save the response in multiple cases: when
 the response is successful, but also when the response is a redirect,
 401 unauthorized or an error. Moving the response-saving to a separate
 method makes it possible to reuse this part for all four situations.

 Any thoughts?

WOW great work!  It is much better now.

I wonder if it is possible to remove the dependency from libuuid, maybe
provide replacement for uuid_generate and uuid_unparse when libuuid is
not found?  Even a simple implementation based on rand?

Beside it, there are only very small adjustments which need to be done
to the code in order to include it into wget, like lines not longer than
80 characters or using foo *bar instead of foo * bar; in any case
these are not important and I can go trough them before commit your
changes.

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH] paramcheck: Use + quantifier and return copy

2011-10-21 Thread Giuseppe Scrivano
Thanks.  Pushed.

Cheers,
Giuseppe



Steven Schubiger s...@member.fsf.org writes:

 === modified file 'ChangeLog'
 --- ChangeLog 2011-09-04 12:19:12 +
 +++ ChangeLog 2011-10-16 18:18:34 +
 @@ -1,3 +1,8 @@
 +2011-10-16  Steven Schubiger  s...@member.fsf.org
 +
 + * util/paramcheck.pl: Match 1 or more times where applicable.
 + (extract_entries): Return a copy instead of reference.
 +
  2011-09-04  Alan Hourihane al...@fairlite.co.uk (tiny change)
  
   * configure.ac: Check for libz when gnutls is used.

 === modified file 'util/paramcheck.pl'
 --- util/paramcheck.pl2011-01-01 12:19:37 +
 +++ util/paramcheck.pl2011-10-16 02:36:40 +
 @@ -33,11 +33,11 @@
  
  my @args = ([
  $main_content,
 -qr/static \s+? struct \s+? cmdline_option \s+? option_data\[\] \s+? = 
 \s+? \{ (.*?) \}\;/sx,
 +qr/static \s+? struct \s+? cmdline_option \s+? option_data\[\] \s+? = 
 \s+? \{ (.+?) \}\;/sx,
  [ qw(long_name short_name type data argtype) ],
  ], [
  $init_content,
 -qr/commands\[\] \s+? = \s+? \{ (.*?) \}\;/sx,
 +qr/commands\[\] \s+? = \s+? \{ (.+?) \}\;/sx,
  [ qw(name place action) ],
  ]);
  
 @@ -78,18 +78,18 @@
  my (@entries, %index, $i);
  
  foreach my $chunk (@$chunks) {
 -my ($args) = $chunk =~ /\{ \s+? (.*?) \s+? \}/sx;
 +my ($args) = $chunk =~ /\{ \s+? (.+?) \s+? \}/sx;
  next unless defined $args;
  
  my @args = map {
tr/'//d; $_
  } map {
 -  /\((.*?)\)/ ? $1 : $_
 +  /\((.+?)\)/ ? $1 : $_
  } split /\,\s+/, $args;
  
  my $entry = { map { $_ = shift @args } @$names };
  
 -($entry-{line}) = $chunk =~ /^ \s+? (\{.*)/mx;
 +($entry-{line}) = $chunk =~ /^ \s+? (\{.+)/mx;
  if ($chunk =~ /deprecated/i) {
  $entries[-1]-{deprecated} = true;
  }
 @@ -103,9 +103,9 @@
  push @entries, $entry;
  }
  
 -push @entries, \%index;
 +push @entries, { %index };
  
 -return \@entries;
 +return [ @entries ];
  }
  
  sub output_results
 @@ -281,7 +281,7 @@
  while ($tex =~ /^\@item\w*? \s+? --([-a-z0-9]+)/gmx) {
  $tex_items{$1} = true;
  }
 -my ($help) = $main =~ /\n print_help .*? \{\n (.*) \n\} \n/sx;
 +my ($help) = $main =~ /\n print_help .*? \{\n (.+) \n\} \n/sx;
  while ($help =~ /--([-a-z0-9]+)/g) {
  $main_items{$1} = true;
  }



[Bug-wget] parallel wget...

2011-10-15 Thread Giuseppe Scrivano
hello,

The winter is coming, not much to do outside and I have spent the day
working on something I had in mind already for too long.

Unfortunately I couldn't start the implementation as I have thought it
could be possible, there are too many nested `select' points in the code
and implement an event-driven single-thread parallel wget seems like too
much work.

I have used different threads, spawned by retrieve_tree in the recur.c,
I haven't published the code yet[1] since it is just a ugly hack for now
and still sometimes it segfaults, it will take a while before I can go
trough the code and ensure it is reentrant and can be used by different
threads without problems.

But I would like to share some results with you

$ LANG=C wget --version | head -n 1
GNU Wget 1.13 built on linux-gnu.

$ LANG=C ./wget --version | head -n 1
GNU Wget 1.13.4-2567-dirty built on linux-gnu.

$ rm -rf it.gnu.org/  time wget -q --no-http-keep-alive -r -np 
http://it.gnu.org/~gscrivano/files/parallel/

real0m2.808s
user0m0.008s
sys 0m0.020s
$ rm -rf it.gnu.org/  time ./wget --jobs=2 -q --no-http-keep-alive -r -np 
http://it.gnu.org/~gscrivano/files/parallel/

real0m1.291s
user0m0.004s
sys 0m0.016s
$ rm -rf it.gnu.org/  time ./wget --jobs=4 -q --no-http-keep-alive -r -np 
http://it.gnu.org/~gscrivano/files/parallel/

real0m0.521s
user0m0.008s
sys 0m0.012s
$ rm -rf it.gnu.org/  time ./wget --jobs=8 -q --no-http-keep-alive -r -np 
http://it.gnu.org/~gscrivano/files/parallel/

real0m0.395s
user0m0.008s
sys 0m0.004s


Nice eh? :-)  Any comment?  Suggestion?  Insult?

Cheers,
Giuseppe

1) but the braves can find the current ugly hack here:
  http://it.gnu.org/~gscrivano/files/parallel_wget.patch



Re: [Bug-wget] WARC output

2011-10-08 Thread Giuseppe Scrivano
Hi Gijs,


Gijs van Tulder gvtul...@gmail.com writes:

 can you please send a complete diff against the current development
 tree version?

 Here's the diff of the WARC additions (1.9MB zipped) to revision 2565:

  http://dl.dropbox.com/u/365100/wget_warc-20110926-complete.patch.bz2

the patch is huge and I think we don't want to add some many files into
the wget tree.  Can't we assume the user will install the warc tools by
herself and let configure check if they are installed or not?  This will
require some more work but the result will be much less intrusive.  What
do you think?

Thanks,
Giuseppe



Re: [Bug-wget] Wget 1.13.4 v. VMS -- Various problems

2011-10-07 Thread Giuseppe Scrivano
Steven M. Schweda s...@antinode.info writes:

  [Various other changes/fixes affecting VMS]
 
Still wondering.

For the curious, a set of patches should be available at:

   http://antinode.info/ftp/wget/wget-1_13_4/1_13_4_1.dru

can you please include a ChangeLog entry for each of them?

Thanks,
Giuseppe



Re: [Bug-wget] Patch: new option --content-on-error: do not skip content on http server error

2011-10-06 Thread Giuseppe Scrivano
Henrik Holst henrik.ho...@millistream.com writes:

 No problem, I'll give it a try, yell at me if I do something wrong:

Good job!  I have applied the patch and pushed it.

Cheers,
Giuseppe



Re: [Bug-wget] Patch: new option --content-on-error: do not skip content on http server error

2011-10-04 Thread Giuseppe Scrivano
Hi Henrik,

Henrik Holst henrik.ho...@millistream.com writes:

 This patch adds an option to not skip the content sent by the HTTP server
 when the server responds with a status code in the 4xx and 5xx range.

thanks for the patch, I am quite inclined to include it.  Can you please
provide the ChangeLog file entry?

Thanks!
Giuseppe



Re: [Bug-wget] Recursive wget: change in handling of file permissions?

2011-10-04 Thread Giuseppe Scrivano
Hi Micah,
Micah Cowan mi...@cowan.name writes:

 So, from where I'm sitting, it looks like --preserve-permissions was an
 implemented feature for two major releases (1.10 and 1.11 series), and
 has now been missing from the last two major releases (1.12 and 1.13).
 Probably, it should be reinstated, and documentation added, to restore
 previous behavior.

 Giuseppe?

thanks for the detailed analysis and sorry for my late reply.  If this
is the case, then I think --preserve-permissions has to be restored as
it used to work in the 1.10.* series.

Cheers,
Giuseppe



Re: [Bug-wget] texinfo @dir information

2011-09-27 Thread Giuseppe Scrivano
k...@freefriends.org (Karl Berry) writes:

 Tiny change for the manual to make its dir entry consistent with others,
 ok?

Ok.  Pushed.

Thanks,
Giuseppe



Re: [Bug-wget] WARC output

2011-09-26 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 Hi.

 It's been a while since we've discussed the WARC addition to Wget. Is
 there anything I can help with?

can you please send a complete diff against the current development tree
version?

I'll take a look at it ASAP.

Thanks,
Giuseppe



Re: [Bug-wget] Introduction

2011-09-26 Thread Giuseppe Scrivano
Manuel José Muñoz Calero manuelj.mu...@gmail.com writes:

 These days I've been reading as much as I could: manual, wiki, code
 and baazar usage.
 If you are agree, I'm beginning with...

  #21439: Support for FTP proxy authentication

It sounds great!


 ... planned release 1.15, status confirmed, assigned to none.

 One question. Should I work with the main branch?

Yes, please.

Cheers,
Giuseppe



Re: [Bug-wget] Introduction

2011-09-26 Thread Giuseppe Scrivano
Daniel Stenberg dan...@haxx.se writes:

 On Mon, 26 Sep 2011, Giuseppe Scrivano wrote:

  #21439: Support for FTP proxy authentication

 It sounds great!

 Since there's no FTP proxy standard or spec, how exactly is this going
 to work?

ops, thanks to have pointed it out.  I wasn't aware of it and I took it
for granted.  The bug report redirects to this discussion:

http://article.gmane.org/gmane.comp.web.wget.general/7300

Giuseppe



Re: [Bug-wget] --version copyright year stale

2011-09-19 Thread Giuseppe Scrivano
k...@freefriends.org (Karl Berry) writes:

 Hi Giuseppe,

 The copyright year in the wget --version output should be 2011, not 2009.
 As seen in 1.13.4.

thanks to have reported it, this patch fixes it:

=== modified file 'src/main.c'
--- src/main.c  2011-09-06 13:53:39 +
+++ src/main.c  2011-09-19 15:26:41 +
@@ -884,7 +884,7 @@
   /* TRANSLATORS: When available, an actual copyright character
  (cirle-c) should be used in preference to (C). */
   if (fputs (_(\
-Copyright (C) 2009 Free Software Foundation, Inc.\n), stdout)  0)
+Copyright (C) 2011 Free Software Foundation, Inc.\n), stdout)  0)
 exit (3);
   if (fputs (_(\
 License GPLv3+: GNU GPL version 3 or later\n\


Cheers,
Giuseppe



[Bug-wget] GNU wget 1.13.4 released

2011-09-17 Thread Giuseppe Scrivano
Hello,

I am pleased to announce the new version of GNU wget.

It fixes some bugs reported in the recent wget 1.13.3 release.

It is available for download here:

ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz
ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz

and the GPG detached signatures using the key C03363F4:

ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz.sig
ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz.sig

To reduce load on the main server, you can use this redirector service
which automatically redirects you to a mirror:

http://ftpmirror.gnu.org/wget/wget-1.13.4.tar.gz
http://ftpmirror.gnu.org/wget/wget-1.13.4.tar.xz

* Noteworthy changes in Wget 1.13.4

** Now --version and --help work again.

** Fix a build error on solaris 10 sparc.

** Now --timestamping and --continue work well together.

** Return a network failure when FTP downloads fail and --timestamping
   is specified.

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.

Have fun!
Giuseppe



Re: [Bug-wget] Wget is not downloading background images.

2011-09-05 Thread Giuseppe Scrivano
ma...@inbox.com writes:

 In these specific tests, I am using GNU Wget 1.11.4 on a Windows platform.

CSS support was added in wget 1.12.

Cheers,
Giuseppe



Re: [Bug-wget] Suggestion: An option for Wget to reset all command-line defaults.

2011-09-05 Thread Giuseppe Scrivano
ma...@inbox.com writes:

 I wonder if Wget needs an option like --resetdefaults=yes to reset any
 changes that may have been made in the .wgetrc file.

I think you can get the same behaviour by using --config=/dev/null.  The
parameter --config is supported since wget 1.13.

Cheers,
Giuseppe



Re: [Bug-wget] Wget should not ignore quota specifications for single files.

2011-09-05 Thread Giuseppe Scrivano
matt...@creativegraphicsolutions.biz writes:

 I've tried several work-arounds for this, all with no success. Wget
 simply refuses to follow quota specifications for single files no
 matter how Wget is invoked.

 Respecting quotas for single files would be useful in other situations
 where Wget is called automatically from within a script.

hard-quotas are not supported (yet), so it doesn't really matter how you
invoke wget, it will never obey :-)  At the moment, as a very ugly
workaround, you can use ulimit -f.

Cheers,
Giuseppe



Re: [Bug-wget] wget 1.13: FIONBIO does not exist on solaris

2011-09-04 Thread Giuseppe Scrivano
Christian Jullien eli...@orange.fr writes:

 When compiling gnutls.c on solaris 10 sparc with gcc 4.6.1
 I get an error on:
   ret = ioctl (fd, FIONBIO, one);
 because FIONBIO is undefined.
  
 Adding: 
  
 #include sys/fcntl.h

 Let:
 #ifdef F_GETFL
   ret = fcntl (fd, F_SETFL, flags | O_NONBLOCK);

 to be used instead. It then compiles and correctly works.
  
 Thank you to see how to include sys/fcntl.h conditionnally. I checked but it
 is not clear to me when and why you decide to include this system file.
  
 I'll be glad to test new versions for you.

Thanks to have reported it.  We can assume sys/fcntl.h is always present
as gnulib will provide a replacement on systems where this file is
missing.

The change I am going to commit is simply:

=== modified file 'src/gnutls.c'
--- src/gnutls.c2011-08-30 14:43:25 +
+++ src/gnutls.c2011-09-04 10:43:35 +
@@ -48,6 +48,8 @@
 #include ptimer.h
 #include ssl.h
 
+#include sys/fcntl.h
+
 #ifdef WIN32
 # include w32sock.h
 #endif




Re: [Bug-wget] A bug with wget 1.13.3

2011-09-02 Thread Giuseppe Scrivano
Hi Vladimir,

thanks, it has been fixed in the source repository.

Cheers,
Giuseppe



Vladimir Lomov lomov...@gmail.com writes:

 Hello,
 I'm on Archlinux x86_64. After updating the system with the help of
 package manager wget aborts on simple `wget --version' with exit code
 3.

 Seems I found the reason of that behavior, I attached with the message
 a patch vs. bzr trunk (revno 2555).

 I checked it on top of wget 1.13.3 (patching release source).

 ---
 WBR, Vladimir Lomov



[Bug-wget] GNU wget 1.13.3 released

2011-08-31 Thread Giuseppe Scrivano
I am pleased to announce the new version of GNU wget.

It is available for download here:

ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.gz
ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.xz

and the GPG detached signatures using the key C03363F4:

ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.gz.sig
ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.xz.sig

To reduce load on the main server, you can use this redirector service
which automatically redirects you to a mirror:

http://ftpmirror.gnu.org/wget/wget-1.13.3.tar.gz
http://ftpmirror.gnu.org/wget/wget-1.13.3.tar.xz

* Noteworthy changes in Wget 1.13.3

** Support HTTP/1.1

** Now by default the GNU TLS library for secure connections, instead of
   OpenSSL.

** Fix some portability issues.

** Handle properly malformed status line in a HTTP response.

** Ignore zero length domains in $no_proxy.

** Set new cookies after an authorization failure.

** Exit with failure if -k is specified and -O is not a regular file.

** Cope better with unclosed html tags.

** Print diagnostic messages to stderr, not stdout.

** Do not use an additional HEAD request when --content-disposition is used,
   but use directly GET.

** Report the average transfer speed correctly when multiple URL's are specified
   and -c influences the transferred data amount.

** GNU TLS backend works again.

** Now --timestamping and --continue works well together.

** By default, on server redirects, use the original URL to get the
   local file name. Close CVE-2010-2252.  This introduces a
   backward-incompatibility; any script that relies on the old
   behaviour must use --trust-server-names.

** Fix a problem when -k is used and some URLs are specified trough
   CSS.

** Convert correctly URLs that need to be encoded to local files when following
   links.

** Use persistent connections with proxies supporting them.

** Print the total download time as part of the summary for recursive downloads.

** Now it is possible to specify a different startup configuration file trough
   the --config option.

** Fix an infinite loop with the error 'filename has sprung into existence'
   on a network error and -nc is used.

** Now --adjust-extension does not modify the file extension if the file ends
   in .htm.

** Support HTTP/1.1 307 redirects keep request method.

** Now --no-parent doesn't fetch undesired files if HTTP and HTTPS are used
   by the same host on different pages.

** Do not attempt to remove the file if it is not in the accept rules but
   it is the output destination file.

** Introduce `show_all_dns_entries' to print all IP addresses corresponding to
   a DNS name when it is resolved.

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.

Have fun!
Giuseppe



Re: [Bug-wget] Wget 1.12 (macports) has NULLs and stuff appended after --convert-links

2011-08-26 Thread Giuseppe Scrivano
Hello Denis,

this bug will be fixed in the next release of wget.  It wasn't
officially released yet but you can find newer tarballs here:
ftp://ftp.gnu.org/gnu/wget

If it still doesn't work for you with 1.13, please report it.

Cheers,
Giuseppe



Denis Laplante denis.lapla...@ubc.ca writes:

 Summary: Wget 1.12 (macports) has NULLs and stuff appended after html
 mirror file.

   command: wget -r --convert-links --adjust-extension ...

 ### RESULT #
   - Result: content has many links translated, but junk appended
   - Sample: systems.1.html
   - mostly good content with left-sidebar links 
 untranslated, but
 main links translated
   - followed by 1307 * NUL
   - followed by 60 lines = 4291 characters from same file 
 (links
 translated) starting at point=31517 of 41150 in middle of left-sidebar
 (links translated).
   - All files affected !

 I have looked at 
 http://savannah.gnu.org/search/?words=convert-linkstype_of_search=bugsSearch=Searchexact=1#options

 ## COMMAND ##
 WG_BASIC=-r --convert-links --adjust-extension --page-requisites
 --no-
 verbose
 WG_HOBBLE=--level=2 --limit-rate=100k --quota=10m --wait-seconds=1
 WG_EXCLUDE=--no-parent --
 reject=*:*,index.php*,Special:*,User:*,Talk:* --exclude-
 directories=/.../Special:*
 PD_SESS_COOKIE=qwertyuiop
 WG_STARTURL=https://wiki...;

 /opt/local/bin/wget ${WG_BASIC} ${WG_RESTRICT} ${WG_EXCLUDE} \
 --header Cookie: wikidb_UserName=...; wikidb__session=$
 {PD_SESS_COOKIE} \
 ${WG_STARTURL}


  VERSION #
 $ wget -V
 GNU Wget 1.12 built on darwin9.8.0.

 +digest +ipv6 +nls +ntlm +opie +md5/openssl +https -gnutls +openssl
 +iri

 Wgetrc:
 /Users/laplante/.wgetrc (user)
 /opt/local/etc/wgetrc (system)
 Locale: /opt/local/share/locale
 Compile: /usr/bin/gcc-4.0 -DHAVE_CONFIG_H
 -DSYSTEM_WGETRC=/opt/local/etc/wgetrc
 -DLOCALEDIR=/opt/local/share/locale -I. -I../lib
 -I/opt/local/include -O2 -arch i386
 Link: /usr/bin/gcc-4.0 -O2 -arch i386 -L/opt/local/lib -liconv -lintl
 -
 arch
 i386 -lssl -lcrypto -lintl -liconv -lc -Wl,-framework
 -Wl,CoreFoundation -ldl -lidn ftp-opie.o openssl.o http-ntlm.o
 gen-md5.o ../lib/libgnu.a

 Copyright (C) 2009 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 http://www.gnu.org/licenses/gpl.html.
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.

 Originally written by Hrvoje Niksic hnik...@xemacs.org.
 Currently maintained by Micah Cowan mi...@cowan.name.
 Please send bug reports and questions to bug-wget@gnu.org.  



Re: [Bug-wget] Download files and preserve their data and time

2011-08-25 Thread Giuseppe Scrivano
Ray Satiro raysat...@yahoo.com writes:

 Calling utime() works. You could also use SetFileTime(). 2489 changed utime 
 to utimes but the CRT doesn't have utimes. 

thanks to have checked it.

I am going to apply the patch below.

Cheers,
Giuseppe



=== modified file 'configure.ac'
--- configure.ac2011-08-11 10:20:25 +
+++ configure.ac2011-08-25 09:01:31 +
@@ -197,7 +197,7 @@
 AC_FUNC_FSEEKO
 AC_CHECK_FUNCS(strptime timegm vsnprintf vasprintf drand48)
 AC_CHECK_FUNCS(strtoll usleep ftello sigblock sigsetjmp memrchr wcwidth mbtowc)
-AC_CHECK_FUNCS(sleep symlink)
+AC_CHECK_FUNCS(sleep symlink utime)
 
 if test x$ENABLE_OPIE = xyes; then
   AC_LIBOBJ([ftp-opie])

=== modified file 'src/utils.c'
--- src/utils.c 2011-08-11 12:23:39 +
+++ src/utils.c 2011-08-25 09:22:03 +
@@ -42,15 +42,23 @@
 #ifdef HAVE_PROCESS_H
 # include process.h  /* getpid() */
 #endif
-#ifdef HAVE_UTIME_H
-# include utime.h
-#endif
 #include errno.h
 #include fcntl.h
 #include assert.h
 #include stdarg.h
 #include locale.h
 
+#if HAVE_UTIME
+# include sys/types.h
+# ifdef HAVE_UTIME_H
+#  include utime.h
+# endif
+
+# ifdef HAVE_SYS_UTIME_H
+#  include sys/utime.h
+# endif
+#endif
+
 #include sys/stat.h
 
 /* For TIOCGWINSZ and friends: */
@@ -487,6 +495,20 @@
 void
 touch (const char *file, time_t tm)
 {
+#if HAVE_UTIME
+# ifdef HAVE_STRUCT_UTIMBUF
+  struct utimbuf times;
+# else
+  struct {
+time_t actime;
+time_t modtime;
+  } times;
+# endif
+  times.modtime = tm;
+  times.actime = time (NULL);
+  if (utime (file, times) == -1)
+logprintf (LOG_NOTQUIET, utime(%s): %s\n, file, strerror (errno));
+#else
   struct timespec timespecs[2];
   int fd;
 
@@ -506,6 +528,7 @@
 logprintf (LOG_NOTQUIET, futimens(%s): %s\n, file, strerror (errno));
 
   close (fd);
+#endif
 }
 
 /* Checks if FILE is a symbolic link, and removes it if it is.  Does




Re: [Bug-wget] Download files and preserve their data and time

2011-08-20 Thread Giuseppe Scrivano
David H. Lipman dlip...@verizon.net writes:

 I don't know when it happened, probably when I upgraded WGET, but when I 
 download files 
 thedy inherit the date and time of the file of when they were downloaded.

 It used to be that when the file was downloaded, it retained the date and 
 time of the file 
 it had on the server.  Not when it was downloaded.

 How can I force WGET to return to that condition ?

it has to work in the same way as it used to do.  It seems to work well
here, using the last revision from the source repository:

$ LANG=C ./wget -q -d http://www.gnu.org/graphics/gnu-head-mini.png 21 | grep 
^Last-Modified
Last-Modified: Sun, 05 Dec 2010 20:58:51 GMT

$ LANG=C stat gnu-head-mini.png  | grep ^Modify
Modify: 2010-12-05 21:58:51.0 +0100

Can you please provide more information?  What version of wget (wget
--version)?  What operating system?  Do you get a different output using
that two commands?

This is also useful for debugging, do you see something different?

$ LANG=C strace -e utimensat ./wget -q 
http://www.gnu.org/graphics/gnu-head-mini.png
utimensat(4, NULL, {{1313833704, 0}, {1291582731, 0}}, 0) = 0

Thanks,
Giuseppe



Re: [Bug-wget] Download files and preserve their data and time

2011-08-20 Thread Giuseppe Scrivano
David H. Lipman dlip...@verizon.net writes:

 WinXP/Vista -- Win32

 Y:\wget --version
 GNU Wget 1.12-2504 built on mingw32.

the change introduced by the revision
gscriv...@gnu.org-20110419103346-cctazi0zxt2770wt could be the reason of
the problem you have reported.

If it is possible for you to compile wget, could you try to revert this
patch?  Does it solve the problem for you?
If you have problems to re-build wget then I'll try to setup the
environment here.

Thanks,
Giuseppe



=== modified file 'bootstrap.conf'
--- bootstrap.conf  2011-04-19 09:31:25 +
+++ bootstrap.conf  2011-04-19 10:33:46 +
@@ -30,9 +30,11 @@
 announce-gen
 bind
 c-ctype
+clock-time
 close
 connect
 fcntl
+futimens
 getaddrinfo
 getopt-gnu
 getpass-gnu

=== modified file 'src/Makefile.am'
--- src/Makefile.am 2011-04-03 22:13:53 +
+++ src/Makefile.am 2011-04-19 10:33:46 +
@@ -37,7 +37,7 @@
 
 # The following line is losing on some versions of make!
 DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ 
-DLOCALEDIR=\$(localedir)\
-LIBS = @LIBICONV@ @LIBINTL@ @LIBS@
+LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME)
 
 bin_PROGRAMS = wget
 wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c\

=== modified file 'src/utils.c'
--- src/utils.c 2011-04-18 12:37:42 +
+++ src/utils.c 2011-04-19 10:33:46 +
@@ -51,8 +51,7 @@
 #include stdarg.h
 #include locale.h
 
-#include sys/time.h
-
+#include sys/stat.h
 
 /* For TIOCGWINSZ and friends: */
 #ifdef HAVE_SYS_IOCTL_H
@@ -488,15 +487,25 @@
 void
 touch (const char *file, time_t tm)
 {
-  struct timeval timevals[2];
-
-  timevals[0].tv_sec = time (NULL);
-  timevals[0].tv_usec = 0L;
-  timevals[1].tv_sec = tm;
-  timevals[1].tv_usec = 0L;
-
-  if (utimes (file, timevals) == -1)
-logprintf (LOG_NOTQUIET, utimes(%s): %s\n, file, strerror (errno));
+  struct timespec timespecs[2];
+  int fd;
+
+  fd = open (file, O_WRONLY);
+  if (fd  0)
+{
+  logprintf (LOG_NOTQUIET, open(%s): %s\n, file, strerror (errno));
+  return;
+}
+
+  timespecs[0].tv_sec = time (NULL);
+  timespecs[0].tv_nsec = 0L;
+  timespecs[1].tv_sec = tm;
+  timespecs[1].tv_nsec = 0L;
+
+  if (futimens (fd, timespecs) == -1)
+logprintf (LOG_NOTQUIET, futimens(%s): %s\n, file, strerror (errno));
+
+  close (fd);
 }
 
 /* Checks if FILE is a symbolic link, and removes it if it is.  Does

=== modified file 'tests/Makefile.am'
--- tests/Makefile.am   2011-04-03 22:13:53 +
+++ tests/Makefile.am   2011-04-19 10:33:46 +
@@ -34,7 +34,7 @@
 PERL = perl
 PERLRUN = $(PERL) -I$(srcdir)
 
-LIBS = @LIBICONV@ @LIBINTL@ @LIBS@
+LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME)
 
 .PHONY: test run-unit-tests run-px-tests
 




Re: [Bug-wget] Support of non-linux OS's going down the drain?

2011-08-19 Thread Giuseppe Scrivano
can you please 
H.Merijn Brand h.m.br...@xs4all.nl writes:

 That is bad. Why? GNU TLS /might/ be more safe than OpenSSL in  some
 aspects, but is is for sure not available on (older) versions of AIX
 and/or HP-UX. It is already quite a bit of work to get OpenSSL and
 OpenSSH to be rather actual/recent on those boxes, but you can simply
 forget getting gnutls to be available on those. The dependency chain
 is a straight hell.

but in that case, a --with-ssl=openssl will fix this problem, as you did.



 With HP-UX 11.00 and HP C-ANSI-C it doesn't even *compile* anymore!

 $ ./configure --prefix=/pro --disable-nls --with-ssl=openssl 
 --without-libiconv-prefix --without-libintl-prefix --without-libgnutls-prefix
 :
 $ make
 :
 cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include
 -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include
 -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include
 -I/usr/contrib/X11R6/include -c -o c-ctype.o c-ctype.c
 source='cloexec.c' object='cloexec.o' libtool=no \
 DEPDIR=.deps depmode=hp /bin/sh ../build-aux/depcomp \
 cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include
 -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include
 -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include
 -I/usr/contrib/X11R6/include -c -o cloexec.o cloexec.c
 cpp: ./, line 4: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed 
 to space.
 cpp: ./, line 7: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed 
 to space.
 cpp: ./, line 13: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. 
 Changed to space.
 cpp: ./, line 21: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. 
 Changed to space.


can you please send me your cloexec.c file (or any other file causing
it)?  My cloexec.c doesn't have such fancy characters, and unfortunately
I don't access to any HP-UX machine where I can test it by myself.

Have you compiled wget 1.12 on your machine?

Thanks,
Giuseppe



[Bug-wget] wget fails to build under HP-UX 11.00

2011-08-19 Thread Giuseppe Scrivano
Hello,

The following bug report was sent to the wget mailing list, I am not
sure why it happens, it seems related to gnulib, has anyone an idea
about it?

I don't have access to any HP-UX box to test it by myself.

Thanks,
Giuseppe

 With HP-UX 11.00 and HP C-ANSI-C it doesn't even *compile* anymore!

 $ ./configure --prefix=/pro --disable-nls --with-ssl=openssl 
 --without-libiconv-prefix --without-libintl-prefix --without-libgnutls-prefix
 :
 $ make
 :
 cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include
 -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include
 -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include
 -I/usr/contrib/X11R6/include -c -o c-ctype.o c-ctype.c
 source='cloexec.c' object='cloexec.o' libtool=no \
 DEPDIR=.deps depmode=hp /bin/sh ../build-aux/depcomp \
 cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include
 -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include
 -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include
 -I/usr/contrib/X11R6/include -c -o cloexec.o cloexec.c
 cpp: ./, line 4: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed 
 to space.
 cpp: ./, line 7: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed 
 to space.
 cpp: ./, line 13: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. 
 Changed to space.
 cpp: ./, line 21: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. 
 Changed to space.
 cc: , line 2: error 1000: Unexpected symbol: �.
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: �.
 cc: , line 4: error 1000: Unexpected symbol: �.
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: |.
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: .
 cc: , line 4: error 1000: Unexpected symbol: `.
 cc: , line 6: error 1000: Unexpected symbol: .
 cc: , line 7: error 1000: Unexpected symbol: p.
 cc: , line 13: error 1000: Unexpected symbol: �.
 cc: , line 16: error 1000: Unexpected symbol: .
 cc: , line 18: error 1000: Unexpected symbol: .
 cc: , line 20: error 1000: Unexpected symbol: .
 cc: , line 21: error 1000: Unexpected symbol: $float.
 cc: panic 2017: Cannot recover from earlier errors, terminating.
 make[4]: *** [cloexec.o] Error 1
 make[4]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib'
 make[3]: *** [all-recursive] Error 1
 make[3]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib'
 make[2]: *** [all] Error 2
 make[2]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/pro/3gl/GNU/wget-1.13.1'
 make: *** [all] Error 2
 Exit 2



Re: [Bug-wget] getopt/'struct options' build error in 1.13.1

2011-08-17 Thread Giuseppe Scrivano
ops...

Thanks to have reported it.  I am sure it depends from a fix for a
similar error Perry had on AIX.

At this point, it seems the only way to fix the problem is to include
config.h at the very beginning of css.c.  I have looked at the flex
documentation but I can't find anything useful to prevent other files to
be included before the C code snippet.

Has anybody an idea?  Should I go for an hack?

Cheers,
Giuseppe



Jack Nagel jackna...@gmail.com writes:

 I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8.
 It fails during 'make' with gcc 4.2 here:

 /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
-DLOCALEDIR=\/usr/local/share/locale\ -I.  -I../lib -I../lib -c css.c
 In file included from ../lib/unistd.h:113:0,
  from css.c:4738:
 ../lib/getopt.h:196:8: error: redefinition of 'struct option'
 /usr/include/getopt.h:54:8: note: originally defined here
 ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long'
 /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' was 
 here
 ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only'
 /usr/include/getopt.h:71:5: note: previous declaration of
 'getopt_long_only' was here

 However, I can successfully build wget 1.13 on the same system under
 the same conditions. (Please CC as I am not subscribed to the list).

 Thanks in advance for the help.

 Jack



Re: [Bug-wget] wget 1.13.1 hangs on redirected url

2011-08-17 Thread Giuseppe Scrivano
Hello,

I have tried the command you suggested but I wasn't able to let it hang.

Are you able to reproduce this problem every time?  If so, can you
please include the debug information generated by --debug?

Thanks,
Giuseppe



Axel Reinhold a...@freakout.de writes:

 Hi,

 wget 1.13.1 hangs on redirected site foreever - this url also has digest 
 authorization!
 works fine with wget 1.12 .

  [wpack@pie ~]$ /tmp/wget-1.13.1-1/bin/wget -O- 
 http://calea.wpack.de/sites/active
 --2011-08-17 08:34:39--  http://calea.wpack.de/sites/active
 Resolving calea.wpack.de (calea.wpack.de)... 188.138.34.37
 Connecting to calea.wpack.de (calea.wpack.de)|188.138.34.37|:80... connected.
 HTTP request sent, awaiting response... 401 Authorization Required
 Reusing existing connection to calea.wpack.de:80.
 HTTP request sent, awaiting response... 200
 Length: 66 [text/html]
 Saving to: `STDOUT'

  0% [ 
   ] 0 K/s

 Regards
 Axel



Re: [Bug-wget] getopt/'struct options' build error in 1.13.1

2011-08-17 Thread Giuseppe Scrivano
Yes, but it seems to create another problem under Mac OS X 10.6.8.

In any case, this is the hack I was talking about, does it work for both
of you?

Thanks,
Giuseppe



=== modified file 'src/Makefile.am'
--- src/Makefile.am 2011-08-11 08:26:43 +
+++ src/Makefile.am 2011-08-17 14:15:58 +
@@ -39,9 +39,12 @@
 DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ 
-DLOCALEDIR=\$(localedir)\
 LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME)
 
+noinst_LIBRARIES = libcss.a
+libcss_a_SOURCES = css.l
+
 bin_PROGRAMS = wget
 wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c\
-  css.l css-url.c \
+  css_.c css-url.c \
   ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \
   http.c init.c log.c main.c netrc.c progress.c ptimer.c \
   recur.c res.c retr.c spider.c url.c\
@@ -57,6 +60,7 @@
 LDADD = $(LIBOBJS) ../lib/libgnu.a
 AM_CPPFLAGS = -I$(top_builddir)/lib -I$(top_srcdir)/lib
 
+
 ../lib/libgnu.a:
cd ../lib  $(MAKE) $(AM_MAKEFLAGS)
 
@@ -78,6 +82,10 @@
$(AM_LDFLAGS) $(LDFLAGS) $(LIBS) $(wget_LDADD)';' \
| $(ESCAPEQUOTE)  $@
 
+css_.c: css.c
+   echo '#include wget.h'  $@
+   cat css.c  $@
+
 check_LIBRARIES = libunittest.a
 libunittest_a_SOURCES = $(wget_SOURCES) test.c build_info.c test.h
 nodist_libunittest_a_SOURCES = version.c



Perry Smith pedz...@gmail.com writes:

 I thought you were just going to remove the include of wget.h ?

 On Aug 17, 2011, at 9:09 AM, Giuseppe Scrivano wrote:

 ops...
 
 Thanks to have reported it.  I am sure it depends from a fix for a
 similar error Perry had on AIX.
 
 At this point, it seems the only way to fix the problem is to include
 config.h at the very beginning of css.c.  I have looked at the flex
 documentation but I can't find anything useful to prevent other files to
 be included before the C code snippet.
 
 Has anybody an idea?  Should I go for an hack?
 
 Cheers,
 Giuseppe
 
 
 
 Jack Nagel jackna...@gmail.com writes:
 
 I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8.
 It fails during 'make' with gcc 4.2 here:
 
 /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
   -DLOCALEDIR=\/usr/local/share/locale\ -I.  -I../lib -I../lib -c css.c
 In file included from ../lib/unistd.h:113:0,
 from css.c:4738:
 ../lib/getopt.h:196:8: error: redefinition of 'struct option'
 /usr/include/getopt.h:54:8: note: originally defined here
 ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long'
 /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' was 
 here
 ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only'
 /usr/include/getopt.h:71:5: note: previous declaration of
 'getopt_long_only' was here
 
 However, I can successfully build wget 1.13 on the same system under
 the same conditions. (Please CC as I am not subscribed to the list).
 
 Thanks in advance for the help.
 
 Jack



Re: [Bug-wget] getopt/'struct options' build error in 1.13.1

2011-08-17 Thread Giuseppe Scrivano
to facilitate the testing, I have uploaded a tarball here:

http://it.gnu.org/~gscrivano/files/wget-1.13.1-dirty.tar.bz2

a263e18bc121d6195b1cf7c78b0ff0ba62ac09c3  wget-1.13.1-dirty.tar.bz2
2ee94ef1011dfea2c98615df0d59b7d1  wget-1.13.1-dirty.tar.bz2

Thanks,
Giuseppe



Perry Smith pedz...@gmail.com writes:

 Do I need all the autoconf stuff for this?  I made the change but the 
 Makefile didn't reflect the changes.

 On Aug 17, 2011, at 9:29 AM, Giuseppe Scrivano wrote:

 Yes, but it seems to create another problem under Mac OS X 10.6.8.
 
 In any case, this is the hack I was talking about, does it work for both
 of you?
 
 Thanks,
 Giuseppe
 
 
 
 === modified file 'src/Makefile.am'
 --- src/Makefile.am  2011-08-11 08:26:43 +
 +++ src/Makefile.am  2011-08-17 14:15:58 +
 @@ -39,9 +39,12 @@
 DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ 
 -DLOCALEDIR=\$(localedir)\
 LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME)
 
 +noinst_LIBRARIES = libcss.a
 +libcss_a_SOURCES = css.l
 +
 bin_PROGRAMS = wget
 wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c
   \
 -   css.l css-url.c \
 +   css_.c css-url.c \
 ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \
 http.c init.c log.c main.c netrc.c progress.c ptimer.c \
 recur.c res.c retr.c spider.c url.c\
 @@ -57,6 +60,7 @@
 LDADD = $(LIBOBJS) ../lib/libgnu.a
 AM_CPPFLAGS = -I$(top_builddir)/lib -I$(top_srcdir)/lib
 
 +
 ../lib/libgnu.a:
  cd ../lib  $(MAKE) $(AM_MAKEFLAGS)
 
 @@ -78,6 +82,10 @@
  $(AM_LDFLAGS) $(LDFLAGS) $(LIBS) $(wget_LDADD)';' \
  | $(ESCAPEQUOTE)  $@
 
 +css_.c: css.c
 +echo '#include wget.h'  $@
 +cat css.c  $@
 +
 check_LIBRARIES = libunittest.a
 libunittest_a_SOURCES = $(wget_SOURCES) test.c build_info.c test.h
 nodist_libunittest_a_SOURCES = version.c
 
 
 
 Perry Smith pedz...@gmail.com writes:
 
 I thought you were just going to remove the include of wget.h ?
 
 On Aug 17, 2011, at 9:09 AM, Giuseppe Scrivano wrote:
 
 ops...
 
 Thanks to have reported it.  I am sure it depends from a fix for a
 similar error Perry had on AIX.
 
 At this point, it seems the only way to fix the problem is to include
 config.h at the very beginning of css.c.  I have looked at the flex
 documentation but I can't find anything useful to prevent other files to
 be included before the C code snippet.
 
 Has anybody an idea?  Should I go for an hack?
 
 Cheers,
 Giuseppe
 
 
 
 Jack Nagel jackna...@gmail.com writes:
 
 I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8.
 It fails during 'make' with gcc 4.2 here:
 
 /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
  -DLOCALEDIR=\/usr/local/share/locale\ -I.  -I../lib -I../lib -c css.c
 In file included from ../lib/unistd.h:113:0,
from css.c:4738:
 ../lib/getopt.h:196:8: error: redefinition of 'struct option'
 /usr/include/getopt.h:54:8: note: originally defined here
 ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long'
 /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' 
 was here
 ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only'
 /usr/include/getopt.h:71:5: note: previous declaration of
 'getopt_long_only' was here
 
 However, I can successfully build wget 1.13 on the same system under
 the same conditions. (Please CC as I am not subscribed to the list).
 
 Thanks in advance for the help.
 
 Jack



Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available

2011-08-14 Thread Giuseppe Scrivano
Perry Smith pedz...@gmail.com writes:

 I took a stab at installing GNUTLS and gave up.  The beauty of wget is
 I can get it going with very few things needed.  I compiled without ssl
 at all but getting openssl going is fairly easy too.  GNUTLS is asking
 for nettle, zlib, and something else (according to the web page) but
 then it snuck up and started asking for pkg-config.  That is way down
 the list in my bring up sequence.

there is no gnu tls package for your system?  Is there need to compile
everything?



 I guess... I don't get what is wrong with openssl. Why do we need GNUTLS
 at all? (we being the open source community.)

Here[1] you can find a good explanation.  OpenSSL is still supported, as
the GNU TLS backend is not as mature as the OpenSSL, my hope is that
pushing it by default will make things change in the future.
If you have so many problems with GNU TLS, what is difficult about
--with-ssl=openssl to configure?

Thanks,
Giuseppe



1) http://people.gnome.org/~markmc/openssl-and-the-gpl.html



Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available

2011-08-13 Thread Giuseppe Scrivano
Jochen Roderburg roderb...@uni-koeln.de writes:

 And in general they seem to want to steer away the users from openssl
 to gnutls and in order to do that the configure script doesn't even
 mention this option any longer.  :-(

 And in the same vein the option --with-libssl-prefix has completely
 disappeared, which used to be helpful when you had your preferred ssl
 library in a non-standard place. Now you have to trick around with
 compiler options to achieve that.

it is fixed in the current development version, and the fix will be
included in the wget release I am going to do in the next few days.

It was already reported on this mailing list some days ago, and it was
the reason why wget 1.13 wasn't released :-)

Cheers,
Giuseppe



Re: [Bug-wget] wget-1.13 on AIX

2011-08-12 Thread Giuseppe Scrivano
Hello Perry,

thanks to have reported it.  Does it work correctly if you drop the
#include wget.h line from css.l?

=== modified file 'src/css.l'
--- src/css.l   2011-01-01 12:19:37 +
+++ src/css.l   2011-08-12 15:18:23 +
@@ -36,7 +36,6 @@
 
 #define YY_NO_INPUT
 
-#include wget.h
 #include css-tokens.h
 
 %}


Thanks,
Giuseppe



Perry Smith pedz...@gmail.com writes:

 Hi,

 I've tried this on AIX 5.3 and 6.1.

 The problem is with src/css.c.  In essence it is doing this:

 #include stdio.h
 #include string.h
 #include errno.h
 #include stdlib.h
 #include inttypes.h
 #define _LARGE_FILES
 #include unistd.h


 The #define of _LARGE_FILES is actually done in config.h via wget.h.

 I understand that AIX is very hard to deal with but this seems like a
 bad idea for any platform.  If you are going to declare that you want
 _LARGE_FILE support, you need to do that before any system includes.
 What this causes is both _LARGE_FILES and _LARGE_FILE_API both get
 defined and that causes one place to declare (for example)

 #define ftruncate   ftruncate64


 (this is in unistd.h around line 733)

 and then later we have:

 extern int  ftruncate(int, off_t);
 #ifdef _LARGE_FILE_API
 extern int  ftruncate64(int, off64_t);
 #endif


 (around line 799) which the compiler complains about with:

 /usr/include/unistd.h:801: error: conflicting types for 'ftruncate64'
 /usr/include/unistd.h:799: error: previous declaration of 'ftruncate64' was 
 here


 There are actually several pairs of these.

 With the above code snippet, if you move the #define to the top, (or 
 completely remove it) the compile works fine.

 It just seems like it would be prudent to declare things like
 _LARGE_FILES in config.h (like you do) but put config.h as the first
 include of each file so that the entire code base knows which
 interface the program wants to use.

 What I did was to move css.c to _css.c.  I put an #ifndef _CONFIG_H wrapper 
 inside config.h and then the new css.c was simply:

 #include config.h
 #include _css.c

 and that worked for my 5.3 system.  I have not tried it on my 6.1 system yet.

 I hope this helps someone.

 Thank you,
 pedz



Re: [Bug-wget] WARC output

2011-08-10 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 It would be cool if Wget could become one of these tools. Already the
 Swiss army knife for mirroring websites, the one thing that Wget is
 missing is a good way to store these mirrors. The current output of
 --mirror is not sufficient for archival purposes:

Sure we do!



 With some help from others, I've added WARC functions to Wget. With
 the --warc-file option you can specify that the mirror should also be
 written to a WARC archive. Wget will then keep everything, including

Can you please track all contributors?  Any contribution to GNU wget
requires copyright assigments to the FSF.



 Do you think this is something that could be included in the main Wget
 version? If that's the case, what should be the next step?

Sure, I will take a look at the code in the next days.  In the
meanwhile, can you check if you are following the GNU Coding Standards
for the new code[1]?



 The implementation makes use of the open source WARC Tools library
 (Apache License 2.0):
  http://code.google.com/p/warc-tools/

how much code is really needed from that library?  I wonder if we can
avoid this dependency at all.

Cheers,
Giuseppe



1) http://www.gnu.org/prep/standards/



Re: [Bug-wget] gnutls link failure, ssl

2011-08-10 Thread Giuseppe Scrivano
Hello Karl,

thanks to have reported it.  It looks like a very ugly one, I think it
depends from last change:

revno: 2517
committer: Giuseppe Scrivano gscriv...@gnu.org
branch nick: wget
timestamp: Fri 2011-08-05 21:36:08 +0200
message:
  gnutls: do not use a deprecated function.

I'll rollback to the deprecated function when
`gnutls_priority_set_direct' is not available.

I will amend your comments into the NEWS file and configure --help.

I think it is too late now to replace packages, and to avoid
synchronization problems with mirrors, I'll go for 1.13.1.  I had the
feeling that 1.13 wasn't going to be released :-)

Thanks,
Giuseppe



k...@freefriends.org (Karl Berry) writes:

 My initial build of wget failed due to gnutls version problems.
 configure said:
 ..
 checking for main in -lgnutls... yes
 configure: compiling in support for SSL via GnuTLS

 But then the link failed with:
 gcc -O2 -Wall -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o
 css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o
 http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o
 retr.o spider.o url.o utils.o exits.o build_info.o iri.o version.o
 ftp-opie.o gnutls.o ../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz
 -lidn -lrt
 gnutls.o: In function `ssl_connect_wget':
 gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct'
 gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct'
 collect2: ld returned 1 exit status

 Evidently configure should check for gnutls_priority_set_direct also.
 And if it fails, hopefully it will fall back to openssl.
 (This was on CentOS 5.6, but presumably that doesn't especially matter.)

 Related, there used to be an option --with-libssl-prefix.  I'm not sure
 when it was removed, but it was useful.

 Also, configure --help does not mention the possibility of
 --with-ssl=openssl.

 Finally, the NEWS file doesn't say anything about either of these:
 preferring tls to openssl or the --with-ssl=openssl option.  I didn't
 look to see if there were other configure options that didn't make to
 the --help and/or NEWS.

 Thanks,
 Karl



Re: [Bug-wget] Bug in processing url query arguments that have '/'

2011-08-08 Thread Giuseppe Scrivano
Peng Yu pengyu...@gmail.com writes:

 I was looking at the patched version. (See the patch posted in bug
 #31147) So I think that the bug in the patch (see the relevant code
 below, where full_file has the query string). I guess for full_file a
 different 'acceptable' function should be used.

   if (opt.match_query_string) full_file = concat_strings(u-file, ?,
 u-query, (char *) 0);

   if (!acceptable (full_file))
 {
 DEBUGP ((%s (%s) does not match acc/rej rules.\n,
   url, full_file));
 goto out;
 }
   }

I am inclined to don't add more options to the current Accept/Reject
rules, as I think they are not flexible enough and quite tricky.

It is better to support a more generic way to specify these rules.

Cheers,
Giuseppe



Re: [Bug-wget] Bug in processing url query arguments that have '/'

2011-08-07 Thread Giuseppe Scrivano
Hello Peng,

AFAICS, `s' is a path, so '/' in the query string is escaped and
`acceptable' doesn't see it.

As for your example:

http://xxx.org/somescript?arg1=/xxy

`s' in this case will be something like:

xxx.org/somescript?arg1=%2Fxxy

Do you have any example where it doesn't work?

Cheers,
Giuseppe



Peng Yu pengyu...@gmail.com writes:

 Hi,

 The following line is in utils.c.

 # in acceptable (const char *s)

   while (l  s[l] != '/')
 --l;
   if (s[l] == '/')
 s += (l + 1);

 It essentially gets a substring after the last '/'. However, when a
 query has '/', this is problematic. For example, the above code snip
 will extract '/xxy' instead of 'somescript?arg1=/xxy'. I think that
 the above code should add the test of the position of '?'. If there is
 a '?', it should look for the last '/' before '?'. Is it the case?

 http://xxx.org/somescript?arg1=/xxy



Re: [Bug-wget] next wget release?

2011-08-06 Thread Giuseppe Scrivano
Noël Köthe n...@debian.org writes:

 I don't want to pester with this question but when is the next wget
 release planed? 1.12 was released 2009-09-22 and since then there were
 some bugfixes and patches integrated in the VCS but they do not reach
 the user.

I have just uploaded another test version.

  ftp://alpha.gnu.org/gnu/wget/wget-1.12-2523.tar.bz2

and the detached GPG signature (using the key C03363F4):

  ftp://alpha.gnu.org/gnu/wget/wget-1.12-2523.tar.bz2.sig

Unless there will be reports like I have lost my home directory when I
specify recursive download, I will release it in the next few days.

Have fun!
Giuseppe



Re: [Bug-wget] next wget release?

2011-08-06 Thread Giuseppe Scrivano
Jochen Roderburg roderb...@uni-koeln.de writes:

 --- ./src/host.c.orig   2011-08-06 16:45:59.0 +
 +++ ./src/host.c2011-08-06 19:49:41.0 +
 @@ -829,7 +829,7 @@
int printmax = al-count;

if (! opt.show_all_dns_entries)
 -printmax = 3;
 +if (printmax  3) printmax = 3;

Thanks, applied!

Regards,
Giuseppe



Re: [Bug-wget] Quotes get striped in cookie values

2011-08-02 Thread Giuseppe Scrivano
Hello Nirgal,

thanks to have reported it.  I am not sure it is really wrong to omit
quotes but in any case I am going to apply this patch:

=== modified file 'src/cookies.c'
--- src/cookies.c   2011-01-01 12:19:37 +
+++ src/cookies.c   2011-08-02 20:53:42 +
@@ -350,6 +350,13 @@
 goto error;
   if (!value.b)
 goto error;
+
+  /* If the value is quoted, do not modify it.  */
+  if (*(value.b - 1) == '')
+value.b--;
+  if (*value.e == '')
+value.e++;
+
   cookie-attr = strdupdelim (name.b, name.e);
   cookie-value = strdupdelim (value.b, value.e);
 

Cheers,
Giuseppe



Nirgal Vourgère jmv_...@nirgal.com writes:

 Hello

 When server sends header:
 Set-Cookie: 
 SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw==;
  Version=1; Path=/
 wget sends afterward:
 Cookie: 
 SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw==
 while it should be sending:
 Cookie: 
 SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw==

 Curl and Iceweasel works fine with that kind of cookies.

 That feature was originally repported on Debian bug tracking system at:
 http://bugs.debian.org/587033

 I am no longer using that web site, and I had switched to curl anyways when I 
 did, so I don't really need a fix.
 But I lost many hours on that problem, and if someone could have a look, it 
 might save other people some time in the future.



Re: [Bug-wget] How to just download cookies?

2011-08-01 Thread Giuseppe Scrivano
Peng Yu pengyu...@gmail.com writes:

 Hi,

 I use the following code to download the cookies. But it will always
 download some_page. Is there a way to just download the cookies?

 wget --post-data='something' --directory-prefix=/tmp
 --save-cookies=cookies_file --keep-session-cookies
 http://xxx.com/some_page  /dev/null

Probably what you want in your command is -O/dev/null, or -O- 
/dev/null.

Cheers,
Giuseppe



Re: [Bug-wget] How to download all the links on a webpage which are in some directory?

2011-08-01 Thread Giuseppe Scrivano
Peng Yu pengyu...@gmail.com writes:

 Suppose I want download  www.xxx.org/somefile/aaa.sfx and the links
 therein (but restricted to the directory www.xxx.org/somefile/aaa/)

 I tried the option  '--mirror -I /somefile/aaa', but it only download
 www.xxx.org/somefile/aaa.sfx. I'm wondering what is the correct option
 to do so?

it looks like the right command.  Can you check using -d what is going
wrong?

Cheers,
Giuseppe



Re: [Bug-wget] next wget release?

2011-07-25 Thread Giuseppe Scrivano
Hi Jan,

$ ldd ./wget
linux-gate.so.1 =  (0xb781d000)
libssl.so.1.0.0 = /usr/lib/i686/cmov/libssl.so.1.0.0 (0xb77b7000)
libcrypto.so.1.0.0 = /usr/lib/i686/cmov/libcrypto.so.1.0.0 (0xb7609000)
libdl.so.2 = /lib/i386-linux-gnu/i686/cmov/libdl.so.2 (0xb7604000)
libz.so.1 = /usr/lib/libz.so.1 (0xb75f)
libidn.so.11 = /usr/lib/i386-linux-gnu/libidn.so.11 (0xb75be000)
librt.so.1 = /lib/i386-linux-gnu/i686/cmov/librt.so.1 (0xb75b5000)
libc.so.6 = /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xb745b000)
/lib/ld-linux.so.2 (0xb781e000)
libpthread.so.0 = /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 
(0xb7441000)

Please note that by default the new wget version will use the GNU TLS
backend instead of OpenSSL, the long term plan is to drop completely
OpenSSL.  That error doesn't appear on both back-ends now.

Cheers,
Giuseppe



Jan Thomas jatho...@redhat.com writes:

 Hey Giuseppe,

 That's great. Can you do a 'ldd wget' and tell me which libs it's linked 
 against?

 I built the last wget openssl-devel in fedora 14 , and it's working, but 
 built against rhel 5 it still fails.

 [Fedora]$ ldd wget
   linux-vdso.so.1 =  (0x7fffc2cd7000)
   libssl.so.10 = /usr/lib64/libssl.so.10 (0x00393380)
   libcrypto.so.10 = /lib64/libcrypto.so.10 (0x00394d00)
   libdl.so.2 = /lib64/libdl.so.2 (0x003eb120)
   librt.so.1 = /lib64/librt.so.1 (0x003eb220)
   libc.so.6 = /lib64/libc.so.6 (0x003eb0e0)
   libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x00393300)
   libkrb5.so.3 = /lib64/libkrb5.so.3 (0x00393340)
   libcom_err.so.2 = /lib64/libcom_err.so.2 (0x003ebce0)
   libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x00393280)
   libz.so.1 = /lib64/libz.so.1 (0x003eb260)
   /lib64/ld-linux-x86-64.so.2 (0x003eb0a0)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x003eb160)
   libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x003932c0)
   libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x003ebe60)
   libresolv.so.2 = /lib64/libresolv.so.2 (0x003eb3e0)
   libselinux.so.1 = /lib64/libselinux.so.1 (0x003eb2e0)


 [rhel5]# ldd wget
   linux-vdso.so.1 =  (0x7fffc4377000)
   libssl.so.6 = /lib64/libssl.so.6 (0x003f9220)
   libcrypto.so.6 = /lib64/libcrypto.so.6 (0x003f8fe0)
   libdl.so.2 = /lib64/libdl.so.2 (0x003f8500)
   librt.so.1 = /lib64/librt.so.1 (0x003f85c0)
   libc.so.6 = /lib64/libc.so.6 (0x003f8480)
   libgssapi_krb5.so.2 = /usr/lib64/libgssapi_krb5.so.2 
 (0x003f9020)
   libkrb5.so.3 = /usr/lib64/libkrb5.so.3 (0x003f91a0)
   libcom_err.so.2 = /lib64/libcom_err.so.2 (0x003f8e60)
   libk5crypto.so.3 = /usr/lib64/libk5crypto.so.3 (0x003f90a0)
   libz.so.1 = /usr/lib64/libz.so.1 (0x003f8580)
   /lib64/ld-linux-x86-64.so.2 (0x003f8440)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x003f8540)
   libkrb5support.so.0 = /usr/lib64/libkrb5support.so.0 
 (0x003f9060)
   libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x003f90e0)
   libresolv.so.2 = /lib64/libresolv.so.2 (0x003f8ac0)
   libselinux.so.1 = /lib64/libselinux.so.1 (0x003f8640)
   libsepol.so.1 = /lib64/libsepol.so.1 (0x003f8600)


 So, I think the bug is in the older version of openssl and not in wget.


 regards, s pozdravem,

 Jan G Thomas
 jatho...@redhat.com

 - Original Message -
 From: Giuseppe Scrivano gscriv...@gnu.org
 To: Jan Thomas jatho...@redhat.com
 Cc: bug-wget@gnu.org
 Sent: Monday, July 25, 2011 12:24:44 PM
 Subject: Re: [Bug-wget] next wget release?
 hey Jan,
 
 this is what I get using the last development version of wget.
 
 $ LANG=en ./wget -O/dev/null
 https://github.com/rg3/youtube-dl/raw/2011.01.30/youtube-dl
 --2011-07-25 12:23:29--
 https://github.com/rg3/youtube-dl/raw/2011.01.30/youtube-dl
 Resolving github.com (github.com)... 207.97.227.239
 Connecting to github.com (github.com)|207.97.227.239|:443...
 connected.
 HTTP request sent, awaiting response... 302 Found
 Location: https://raw.github.com/rg3/youtube-dl/2011.01.30/youtube-dl
 [following]
 --2011-07-25 12:23:30--
 https://raw.github.com/rg3/youtube-dl/2011.01.30/youtube-dl
 Resolving raw.github.com (raw.github.com)... 207.97.227.243
 Connecting to raw.github.com (raw.github.com)|207.97.227.243|:443...
 connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 93827 (92K) [text/plain]
 Saving to: `/dev/null'
 
 100%[==]
 93,827 305K/s in 0.3s
 
 2011-07-25 12:23:32 (305 KB/s) - `/dev/null' saved [93827/93827]
 
 Cheers,
 Giuseppe
 
 
 
 Jan Thomas jatho...@redhat.com writes:
 
  Ciao Giuseppe,
 
  Great

Re: [Bug-wget] Bug in WGET?

2011-07-24 Thread Giuseppe Scrivano
Patrick Steil patr...@churchbuzz.org writes:

 Also, if I use wget in spider mode, it will at the end of the log
 output tell me about all the broken links... but I also need to know
 what page those broken links are created on (if the broken link) is on
 the site I am getting... this will help me find the 404 on my site... 

 I have a vision for how this should work to make it awesome... 

 Any way to do that, or anyone want to add this functionality?

I don't think it is possible at the moment, but add this feature
shouldn't take much time.

The feature seems interesting but I don't think it is going to be
implemented before the next release.

You can wait until someone is going to implement it, or you can take
advantage of the fact wget is Free software and implement it by yourself
or hire someone to do it for you.

Cheers,
Giuseppe



Re: [Bug-wget] Bug in WGET?

2011-07-23 Thread Giuseppe Scrivano
Hello,

Patrick Steil patr...@churchbuzz.org writes:

 If I run this command:

 wget www.domain.org/news?page=1 options= -r --no-clobber --html-extension
 --convert-links -np --include-directories=news

 Here is what it does today:

 1.  When --html-extension is turned on, the --noclobber is not changing the
 name of the downloaded files, but it DOES rewrite the file as the date/time
 stamp changes every time I run the above command.

I couldn't reproduce it.  I have `strace'd but I can't see any syscall
which could modify the time stamp.  Can you please attach the strace
and the wget debug log?  You can get it by:

strace -o strace.log wget args -d -o wget.log



 2.  If I turn off --html-extension, then as soon as WGET sees that the first
 file has already been downloaded it stops and does not continue to
 spider/download any further pages.

AFAICS, the behaviour you get using --no-clobber and -r is documented,
and it should work exactly as you described it (a newer version is
ignored).  The old version is still traversed for links.

Cheers,
Giuseppe



Re: [Bug-wget] wget 1.12 generates duplicated contents

2011-07-20 Thread Giuseppe Scrivano
Hello,

I couldn't reproduce the problem here, I get the same content I get with
the browser.

Does it behave differently if you use a recursive download or if you
request a single page?  Does it happen everytime?

If you are able to reproduce it, can you please post the output you get
running wget with --debug, otherwise please attach the content of
index.html.

Thanks,
Giuseppe



Anh Ta a...@squiz.co.uk writes:

 Hi,

 I ran the following command with wget 1.12:

 wget -r -l 1 -E -k -nv --wait=0.5 --random-wait http://www.beds.ac.uk

 The downloaded file www.beds.ac.uk/index.html (zip file attached )
 contained duplicated footer. When I ran with greater depth level,
 e.g. -l 15 and -p option, there were more pages with duplicated
 footers.

 The problem disappeared when I ran the same command with wget 1.11.4
 . However, I need version 1.12 to have links in CSS downloaded and
 replaced.

 Could someone please help or give me some advices?

 Many Thanks,
 Anh



Re: [Bug-wget] Wget and missing cookies

2011-07-19 Thread Giuseppe Scrivano
Hello,

how are you invoking wget?  Do you see something different in the http
headers when you use --debug?

Thanks,
Giuseppe



Richard van Katwijk rich...@three6five.com writes:

 Hi,

 I am using the firefox plugin 'httpfox' to trace the sending and receiving
 of cookies between my browser and the web server. I can see cookies
 initially being received by the browser, and then subsequently being sent
 back to the server on further page requests.

 However, simulating the same, simple request with wget (using -S and -d) I
 do *not* see these cookies being received.

 I have tested several sites that i know well - some do send the cookies to
 wget, but others dont, even though tools such as 'httpfox' do always show
 them, as expected.

 Is there any reason whey either wget wouldnt see cookies being sent, or
 maybe why the server would not send cookies to the wget user-agent?

 Thanks,
 Richard



Re: [Bug-wget] Wget authorization failed with --spider option

2011-07-06 Thread Giuseppe Scrivano
Can it be that the server allows GET but not HEAD?

Can you attach the debug log without --spider as well?  You can drop the
payload if it is confidential :-)  The request and the response headers
matter.

Thanks,
Giuseppe



Avinash pavin...@gmail.com writes:

 Hi ,

 I am getting 'Authorization Failed' error on following URL with --spider
 option.

 whereas, it works and also downloads the file when I remove --spider
 option.

 My requirement is not to download it, but to read the server-response only.
 Anybody any idea as to why it is happening ?

 

 /usr/bin/wget --debug --server-response --spider
 http://172.20.241.55:/9/Acceptable%20Use/Confidential_Internal_Memos.docx--http-user=test
 --http-password=password

 s.docx --http-user=test --http-password=password
 Setting --server-response (serverresponse) to 1
 Setting --spider (spider) to 1
 Setting --http-user (httpuser) to test
 Setting --http-password (httppassword) to Recnex#1
 DEBUG output created by Wget 1.10.2 (Red Hat modified) on linux-gnu.

 --11:11:43--
 http://172.20.241.55:/9/Acceptable%20Use/Confidential_Internal_Memos.docx
= `Confidential_Internal_Memos.docx'
 Connecting to 172.20.241.55:... connected.
 Created socket 3.
 Releasing 0x005416f0 (new refcount 0).
 Deleting unused 0x005416f0.

 ---request begin---
 HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0
 User-Agent: Wget/1.10.2 (Red Hat modified)
 Accept: */*
 Authorization: Basic dGVzdDpSZWNuZXgjMQ==
 Host: 172.20.241.55:
 Connection: Keep-Alive

 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 401 Unauthorized
 Content-Length: 1656
 Content-Type: text/html
 Server: Microsoft-IIS/6.0
 WWW-Authenticate: Negotiate
 WWW-Authenticate: NTLM
 X-Powered-By: ASP.NET
 Date: Wed, 06 Jul 2011 05:55:12 GMT
 Connection: keep-alive

 ---response end---

   HTTP/1.1 401 Unauthorized
   Content-Length: 1656
   Content-Type: text/html
   Server: Microsoft-IIS/6.0
   WWW-Authenticate: Negotiate
   WWW-Authenticate: NTLM
   X-Powered-By: ASP.NET
   Date: Wed, 06 Jul 2011 05:55:12 GMT
   Connection: keep-alive
 Registered socket 3 for persistent reuse.
 Disabling further reuse of socket 3.
 Closed fd 3
 Empty NTLM message, starting transaction.
 Creating a type-1 NTLM message.
 Connecting to 172.20.241.55:... connected.
 Created socket 3.
 Releasing 0x005415a0 (new refcount 0).
 Deleting unused 0x005415a0.

 ---request begin---
 HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0
 User-Agent: Wget/1.10.2 (Red Hat modified)
 Accept: */*
 Authorization: NTLM TlRMTVNTUAABAgIgACA=
 Host: 172.20.241.55:
 Connection: Keep-Alive

 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 401 Unauthorized
 Content-Length: 1539
 Content-Type: text/html
 Server: Microsoft-IIS/6.0
 WWW-Authenticate: NTLM
 TlRMTVNTUAACADgCAgACAugUTcnBdbk4BQLODg8=
 X-Powered-By: ASP.NET
 Date: Wed, 06 Jul 2011 05:55:12 GMT
 Connection: keep-alive

 ---response end---

   HTTP/1.1 401 Unauthorized
   Content-Length: 1539
   Content-Type: text/html
   Server: Microsoft-IIS/6.0
   WWW-Authenticate: NTLM
 TlRMTVNTUAACADgCAgACAugUTcnBdbk4BQLODg8=
   X-Powered-By: ASP.NET
   Date: Wed, 06 Jul 2011 05:55:12 GMT
   Connection: keep-alive
 Registered socket 3 for persistent reuse.
 Disabling further reuse of socket 3.
 Closed fd 3
 Received a type-2 NTLM message.
 Creating a type-3 NTLM message.
 Connecting to 172.20.241.55:... connected.
 Created socket 3.
 Releasing 0x005432a0 (new refcount 0).
 Deleting unused 0x005432a0.

 ---request begin---
 HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0
 User-Agent: Wget/1.10.2 (Red Hat modified)
 Accept: */*
 Authorization: NTLM
 TlRMTVNTUAADGAAYAEQYABgAXABABAAEAEAARAB0AYIAAHRlc3TAT6OiQKrO+dHjEjlknU5AyFpl7cOFhxbwn8z4gcxySH43C9uoPx96OryCmJ3OKAU=
 Host: 172.20.241.55:
 Connection: Keep-Alive

 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 401 Unauthorized
 Content-Length: 1539
 Content-Type: text/html
 Server: Microsoft-IIS/6.0
 WWW-Authenticate: Negotiate
 WWW-Authenticate: NTLM
 X-Powered-By: ASP.NET
 Date: Wed, 06 Jul 2011 05:55:12 GMT
 Connection: keep-alive

 ---response end---

   HTTP/1.1 401 Unauthorized
   Content-Length: 1539
   Content-Type: text/html
   Server: Microsoft-IIS/6.0
   WWW-Authenticate: Negotiate
   WWW-Authenticate: NTLM
   X-Powered-By: ASP.NET
   Date: Wed, 06 Jul 2011 05:55:12 GMT
   Connection: keep-alive
 Registered socket 3 for persistent reuse.
 Disabling further reuse of socket 3.
 Closed fd 3
 Authorization failed.



Re: [Bug-wget] Question regarding WGET

2011-06-29 Thread Giuseppe Scrivano
are you executing wget from the c:\Windows\system32 directory?

To prevent the file to be being written to the disk, you can specify
-O NUL  on the command line, never tried by myself but I remember it
works under Windows.

Giuseppe



Itay Levin itay.le...@onsettechnology.com writes:

 I'm using it with the following notation:

 WGET http://www.mysite.com/a.aspx

  

 And I noticed that it downloads this page to c:\Windows\system32 folder
 that is being filling up with a.aspx, a.aspx.1 a.aspx.2 and so on...

 Are there any command line flags that I can use to prevent this files to
 being written to disk? 

  

  

 Thanks,

 Itay Levin

 

 OnPage Priority Messaging

 Rise above the clutter(tm)!

 Get your FREE OnPage at: www.OnPage.com http://www.onpage.com/ 

 Follow OnPage on Facebook http://www.facebook.com/OnPage   Twitter
 http://www.twitter.com/On_Page !

  



Re: [Bug-wget] wget IDN support

2011-06-29 Thread Giuseppe Scrivano
Thanks to have reported these problems.  I'll take a look at them in the
next few days.

Cheers,
Giuseppe



Merinov Nikolay kim.roa...@gmail.com writes:

 Current realisation of IDN support in wget not worked when system uses
 UTF-8 locale.

 Current realisation of function `url_parse' from src/url.c call
 `remote_to_utf8' from src/iri.c and set `iri-utf8_encode' to returned
 value.

 Function `remote_to_utf8' can return false in two cases:
 1. They can not convert string to UTF-8
 2. Source text is same as result text

 Second case appear when system use UTF-8 encoding.

 This can be fixed in several places:
 In src/url.c (url_parse) with adding comparing iri-orig_url with UTF-8
 In src/iri.c (remote_to_utf8) with removing if (!strcmp (str, *new))
   test at the end of function.
 Or in src/iri.c (remote_to_utf8) with replacing return status when
   result is same as input string.

 Last variant can be written like this:

 === modified file 'src/iri.c'
 --- src/iri.c 2011-01-01 12:19:37 +
 +++ src/iri.c 2011-06-23 16:34:10 +
 @@ -277,7 +277,7 @@
if (!strcmp (str, *new))
  {
xfree ((char *) *new);
 -  return false;
 +  *new = NULL;
  }
  
return ret;


 Also it can be a good idea to fix src/host.c(lookup_host) with replacing
 usage `gethostbyname_with_timeout' by `getaddrinfo_with_timeout' and
 using AI_IDN flag, if wget compiled with glibc version 2.3.4 or
 newer. It can be helpful when wget compiled without iri support.



Re: [Bug-wget] wget without http

2011-06-29 Thread Giuseppe Scrivano
David H. Lipman dlip...@verizon.net writes:

 If you are using Mapped Drives, there is NO NEED to use WGET as there are 
 plenty of OS 
 utilities from XXCOPY to RoboCopy.

though these tools have two problems, first of all they are not free.
Second, as already reported, they don't follow HTML links.

It could be a good idea to handle file:// as well.

Giuseppe



Re: [Bug-wget] Question regarding WGET

2011-06-29 Thread Giuseppe Scrivano
Itay Levin sit...@gmail.com writes:

 no i didn't specify any output dir - so it by default created the
 files in c:\windows\system32

but still it could be the working directory where wget is executed.

Giuseppe



Re: [Bug-wget] Question to a specific situation

2011-06-25 Thread Giuseppe Scrivano
Hello,

d113803_0-m m...@rtinlochner.de writes:

 150 Opened data connection.
 fertig.
 113803.webhosting42.1blu.de/www/demos/.listing: Permission denied

can you check your permissions on the /www/demos/ directory?  Can you
browse it?

Cheers,
Giuseppe



Re: [Bug-wget] wget to a folder

2011-06-12 Thread Giuseppe Scrivano
what version are you using?

It seems to work well here:

$ wget -q -P testdir ftp://alpha.gnu.org/gnu/wget/wget-1.12-2504.tar.bz2  ls 
testdir/
wget-1.12-2504.tar.bz2

Giuseppe



Michele Prendin mich...@micheleprendin.com writes:

 Hello there,

 I'm facing issues to use wget -P to download to a folder

 eh

 wget -P testfolder http//www.google.com/downloadme.zip

 wget only works when i download the file in a folder after reaching it with
 cd folder

 i tried all i could / testfolder , testfolder/ full path, still facing the
 same problem

 any suggestion?

 thanks

 best regards

 MP



Re: [Bug-wget] wget to a folder

2011-06-12 Thread Giuseppe Scrivano
Michele Prendin mich...@micheleprendin.com writes:

 Thanks Giuseppe for the help,

 i fixed the issues upgrading wget (you udate wget(

 despite now i can save in the folder i want, i have another issue

 with the older wget when i was using

 wget www.google.com/popupfile.php

 the phpfile forwarded the file to be downloaded with the header (if
 the file forwarded was abc.gz, wget were able to store /abc.gz)

 with the new wget the download works, but instead of saving it with
 the name abc.gz he uses popupfile.php

 any solution to this? i couldnt find anything useful from the -help

You should specify: --trust-server-names.

You can find the reason why we have added it here:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2010-2252

Giuseppe



Re: [Bug-wget] wget fails to encode spaces in URLs

2011-06-08 Thread Giuseppe Scrivano
Hi Volker,

I see it now, thanks.  This small patch makes sure the url is parsed in
any case.

Cheers,
Giuseppe



=== modified file 'src/retr.c'
--- src/retr.c  2011-06-05 12:31:24 +
+++ src/retr.c  2011-06-08 09:29:20 +
@@ -1005,9 +1005,7 @@
   break;
 }
 
-  /* Need to reparse the url, since it didn't have iri information. */
-  if (opt.enable_iri)
-  parsed_url = url_parse (cur_url-url-url, NULL, tmpiri, true);
+  parsed_url = url_parse (cur_url-url-url, NULL, tmpiri, true);
 
   if ((opt.recursive || opt.page_requisites)
(cur_url-url-scheme != SCHEME_FTP || getproxy (cur_url-url)))



Volker Kuhlmann list0...@paradise.net.nz writes:

 Hi Giuseppe,

 Thanks!

 I compiled it with libproxy: same problem.

 I then compiled it with just 
   ./configure --prefix=/tmp/.../
   make

 ./src/wget -i-
 http://downloads.sourceforge.net/project/bandwidthd/bandwidthd/bandwidthd 
 2.0.1/bandwidthd-2.0.1.tgz?r=ts=1307308092use_mirror=transact
 ^D

 (note the space after bandwidthd) and wireshark gives me:

 GET /project/bandwidthd/bandwidthd/bandwidthd 
 2.0.1/bandwidthd-2.0.1.tgz?r=ts=1307308092use_mirror=transact HTTP/1.1
 User-Agent: Wget/1.12-2504 (linux-gnu)
 Accept: */*
 Host: downloads.sourceforge.net
 Connection: Keep-Alive


 Sorry NOT FIXED.


 My system and user wgetrc contain

 prefer-family = none

 use_proxy = off
 dirstruct = on
 timestamping = on
 dot_bytes = 64k
 dot_spacing = 10
 dots_in_line = 50
 backup_converted = on


 Volker



Re: [Bug-wget] Issue with TOMCAT SSL server wget

2011-06-08 Thread Giuseppe Scrivano
please keep the mailing list CC'ed in your replies.

It seems the server doesn't accept the client certificate.  Are you sure
the cert.pem certificate is included in keystore.jks?

Giuseppe



brad bruggemann bradley.bruggem...@gmail.com writes:

 Giuseppe,

 There's a correction to my original post. The output that I get when I
 run the original command (with secure-protocol) is:

 OpenSSL: error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad
 certificate

 When I run it without secure-protocol i get:

 OpenSSL: error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert
 unexpected message

 On Wed, Jun 8, 2011 at 7:08 AM, Giuseppe Scrivano gscriv...@gnu.org
 wrote:

 brad bruggemann bradley.bruggem...@gmail.com writes:
 
      Use wget to grab file:
      wget --secure-protocol=TLSv1 --certificate-type=PEM
 --certificate=/
  path.to/cert.pem --password=
 https://IP_ADDRESS:1234/file.txt -o
  /tmp/file.txt
 
 
 what does happen if you don't specify --secure-protocol?
 
 Cheers,
 Giuseppe
 



Re: [Bug-wget] wget fails to encode spaces in URLs

2011-06-05 Thread Giuseppe Scrivano
Hi Volker,

thanks to have reported this bug but it was fixed in the development
version of wget and the fix will be included in the next release.

Can you please confirm if it works for you?

You can fetch a source tarball here:
  ftp://alpha.gnu.org/gnu/wget/wget-1.12-2504.tar.bz2

Thanks,
Giuseppe



Volker Kuhlmann list0...@paradise.net.nz writes:

   wget --version
 GNU Wget 1.12 built on linux-gnu.

 To reproduce:

 Go to any sourceforge project and download a file whos URL contains a
 space. Copy the direct link from the download page into wget -i-

 Run wireshark and press ^D in the wget input stream.

 If the upstream strips spaces (e.g. squid, default setting in pfsense)
 the download goes round in circles.

 The bug does not exist in wget when passing the URL on the command line.
 I always use -i- because of all the shell crud in URLs.

 I am using the openSUSE 11.4 version, but the only source code change is
 additional support for libproxy.


 Problem:

 Looking at the source, in main.c url_parse() is called for each URL from
 the command line. For -i, it calls retrieve_from_file().

 retrieve_from_file() (in retr.c) reads a list of URLs from the given
 file. It then calls url_parse() only if IRI is enabled (which in my
 version of wget is not even compiled in).
 Hence the URL is never parsed and never encoded before being downloaded
 with retrieve_url().
 That's a bug.

 The fix is probably to always call url_parse() in retrieve_from_file(),
 and not only when IRI is turned on.


 If this goes to a mailing list, please cc me on replies, I am not
 subscribed.

 Thanks,

 Volker



Re: [Bug-wget] wget

2011-05-20 Thread Giuseppe Scrivano
I doubt it will work with a recent version of wget.  Anyway, I suggest
you to take an older version (something like 1.10.2) and apply the patch
using GNU patch, once the source code is patched you can build it and
get the wget executable.

Cheers,
Giuseppe


 
Dale Egan d...@leemyles.com writes:

 How can I delete a file on remote server after I down load it. I am down
 loading a couple files and need to del right after download before more
 files are added. Command I am using is (wget -r -nd
 ftp://name:passw...@something.com). I am using (GNU Wget 1.11.4 Red Hat
 modified) on centos 5.5. I have found a patch at
 (http://osdir.com/ml/web.wget.patches/2005-09/msg5.html   ) Look like it
 would work but I do not understand how to install it.

   

 Thanks Dale

  

  



Re: [Bug-wget] Recursive wget with URL filter/under certain (non-parent) directory?

2011-05-09 Thread Giuseppe Scrivano
Yang Zhang yanghates...@gmail.com writes:

 I mentioned --include-directories in my original email. I couldn't
 figure out how to use it to this effect. Could you demonstrate?

have you already tried the following one?

wget -r -I /host/foo/ http://host/foo/bar/baz/index.cgi?page=1

Giuseppe



Re: [Bug-wget] Recursive wget with URL filter/under certain (non-parent) directory?

2011-05-09 Thread Giuseppe Scrivano
Micah Cowan mi...@cowan.name writes:

 have you already tried the following one?
 
 wget -r -I /host/foo/ http://host/foo/bar/baz/index.cgi?page=1

 Shouldn't that be just -I /foo/  ?

Yeah, sure :-)

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH] set exit code to 1 if invalid host name specified

2011-04-24 Thread Giuseppe Scrivano
Hi Daniel,

thanks for your contribution!  I have pushed your first patch.  I will
wait for your copyright assignments before push the patch with the new
tests.

Thanks again,
Giuseppe



Daniel Manrique dan...@tomechangosubanana.com writes:

 Hi Giuseppe,

 I've started the assignment process, to at least get the ball rolling,
 even if it's not complete in time for the new release.

 I've also made the changes you suggested to coding style, and split
 the changes into two patches.

 Thanks so much for your help and suggestions! Do let me know if more
 changes are needd.

 Regards,
 - Daniel


 On Sat, Apr 23, 2011 at 10:45 AM, Giuseppe Scrivano gscriv...@gnu.org wrote:
 Thanks for the patch.  It looks ok but in order to apply it, you need to
 complete the copyright assignments process to the FSF.  We are very
 quite close to have a wget release and I doubt the FSF will receive your
 assignments before it.  Can you please divide your patch in two?  Keep
 changes to the source code in one patch and the new tests in another.

 Please keep the GNU coding style:


 Daniel Manrique dan...@tomechangosubanana.com writes:

 === modified file 'src/html-url.c'
 --- src/html-url.c    2011-01-01 12:19:37 +
 +++ src/html-url.c    2011-04-23 00:48:22 +
 @@ -810,6 +810,7 @@
                       file, url_text, error);
            xfree (url_text);
            xfree (error);
 +          inform_exit_status(URLERROR);

 Please maintain the GNU coding style:

  inform_exit_status (URLERROR);

 Cheers,
 Giuseppe


 # Bazaar merge directive format 2 (Bazaar 0.90)
 # revision_id: roa...@tomechangosubanana.com-20110423193141-\
 #   iaihkimpxowwm0gh
 # target_branch: file:///home/roadmr/wget/trunk/
 # testament_sha1: 3f2bdd4370318611a56293444fe3f320d8e39961
 # timestamp: 2011-04-23 15:31:47 -0400
 # base_revision_id: gscriv...@gnu.org-20110419124021-fi310a2hc7mz2j9y
 # 
 # Begin patch
 === modified file 'src/ChangeLog'
 --- src/ChangeLog 2011-04-19 12:40:21 +
 +++ src/ChangeLog 2011-04-23 19:31:41 +
 @@ -1,3 +1,9 @@
 +2011-04-21  Daniel Manrique roa...@tomechangosubanana.com
 + * main.c (main): Set exit status when invalid host name given in
 + command line.
 + * html-url.c (get_urls_file): Set exit status when invalid host
 + name given in input file.
 +
  2011-04-19  Giuseppe Scrivano  gscriv...@gnu.org
  
   * gnutls.c: Do not include fcntl.h.

 === modified file 'src/html-url.c'
 --- src/html-url.c2011-01-01 12:19:37 +
 +++ src/html-url.c2011-04-23 19:31:41 +
 @@ -810,6 +810,7 @@
   file, url_text, error);
xfree (url_text);
xfree (error);
 +  inform_exit_status (URLERROR);
continue;
  }
xfree (url_text);

 === modified file 'src/main.c'
 --- src/main.c2011-03-21 12:14:20 +
 +++ src/main.c2011-04-23 19:31:41 +
 @@ -1347,6 +1347,7 @@
char *error = url_error (*t, url_err);
logprintf (LOG_NOTQUIET, %s: %s.\n,*t, error);
xfree (error);
 +  inform_exit_status (URLERROR);
  }
else
  {
 @@ -1387,7 +1388,9 @@
if (opt.input_filename)
  {
int count;
 -  retrieve_from_file (opt.input_filename, opt.force_html, count);
 +  int status;
 +  status = retrieve_from_file (opt.input_filename, opt.force_html, 
 count);
 +  inform_exit_status (status);
if (!count)
  logprintf (LOG_NOTQUIET, _(No URLs found in %s.\n),
 opt.input_filename);

 # Begin bundle
 IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWRZjdEsAAsLfgBEwUXf//14G
 mgCwUAV4ke6axYuSJTQNCSSJgp6bSniZNGmSg8BEaAxGI9QJSTSbBCemTSmQ0AACRIRq
 aaZAo9Twoeo9TIBpoMmRkyBzAJpgEyGAAJgmAAACSRMQ0IwTITQ9FT0nqPKPKMgAHqTRML+HX3S0
 67LOA2zs2H3N++u6sO4mwtnZ5N94BJee7g8lJO6yJaUlCKbu46ZP7S3KRVttpjWzqF5p1z4SMM8G
 muiVYiTUTrK14GGjsORgFF2f2CXWuWqGhnQaFb6uXJghBhUKqq2vWX+baldir5I6BQsJXgcrbft2
 tXvMIsX9TKtFgwwvQRVmSjEqJ6rcaC9BiVxnkGmdoK48BeVXqmW5b8dtqBygWbdEIKOSBtAOaB3z
 DjoQdxTT0QJRwNaCqJqQwn6BjKS2CPGtmIqYwogLhFeLQZqRzQktpnuDpOSAmwiEd5SWmQSJXPfH
 NrZo+YMZo3CJQyPvP03kUCM1lfCtYQEIWjTRUkZSo1iolTH4IH2MgN4S4EZDjOkQ8TuRrpegx1hf
 bUZsJ514E+H93hy+Ay3aYZXYF6BxNxdqEBDhEZQhIkUjiciArkIxe2M+eWsZdhXVEUvwxoF6SpYb
 C0JtI0dXPM1WrEC4pNcMjgcxMYbqi0K+A0Uxbk+1BQWUYWQfjW68YOLhpTEbbarMiy0YpAdXv5YX
 WK3puQ58DAZGRYXXsDOLqWgY6VTwaaB4xwSILG1pFqDB5eWNNILmZihBSmuF5I+olcihxY+6dEGR
 I4oKiT8oPB7i56Q2ExjeXGFpvade60TZYLyCDSeIfMzT3BZjVyzZJCiyyApCHYSiYWittdQ+HENC
 9u29tmuIhvnTDm49SjpFvBW5U023+rb/VP9nQRMg7zwIKk7X2mYNWq5GmFApfSQigFk9HcMuYuvQ
 7Gx0eQ9HlJ6uLuaFu5i9RGCPD5qCsVjAuYKOOxJepIL2OlKCJTIOlvZ3lhYciDUtJ3gsKAy+hwEM
 FqY6TFwd21+ce+Z8oatfPnsCxovBfPL+MkHUXSSqs56xxOZQZZlVKKD7ipF5RMilUeUPkbBU9Cj8
 vfkMsBmscH06+fjmXQobHtEvXYq8jcw6Xhe9QC3SAocCerXS3wguJc9bbvNxQUkSpdMDJka7LMtl
 Gg/inlfmcLuX9UyUk5pJNIxyWjEx2EZx35B2h+Ap9k1XBMnZc2UBLC29XC8uPukcJjwzIPIRWuiD
 /BRBTF3oKBZsm2A/1wlxCMPhP0ETWLcfdtAPO45hhzEZGPmn2JM8lDsGoipChP2IeoUOFNn6nWLJ

Re: [Bug-wget] CNET download links not working with WGET

2011-04-23 Thread Giuseppe Scrivano
hello,

the  character in the url is interpreted by your shell.

Try using something like:

wget URL

Cheers,
Giuseppe



Jeff Givens j...@sds.net writes:

 Hello, I am having an issue downloading files via download links from
 CNET.  It appears to locate some of the URL but stops at the first
 siteId part.  I have included the debug information as well.  Thanks
 in advance for your help.

 C:\DOWNLOAD\wget http://dw.com.com/redir?edId=3siteId=4oId=300
 0-8022_4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag
 =tdw_dltextltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLw
 oOYJQAABuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572
 .html%3Fspi%3D077d9109e846975d0db9532bd610588f
 --2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3
 Resolving dw.com.com... 216.239.113.95
 Connecting to dw.com.com|216.239.113.95|:80... connected.
 HTTP request sent, awaiting response... 302 Found
 Location: http://dw.com.com/redir/redx/?edId=3 [following]
 --2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3
 Reusing existing connection to dw.com.com:80.
 HTTP request sent, awaiting response... 404 Not Found
 2011-04-19 11:30:36 ERROR 404: Not Found.

 'siteId' is not recognized as an internal or external command,
 operable program or batch file.
 'oId' is not recognized as an internal or external command,
 operable program or batch file.
 'ontId' is not recognized as an internal or external command,
 operable program or batch file.
 'spi' is not recognized as an internal or external command,
 operable program or batch file.
 'lop' is not recognized as an internal or external command,
 operable program or batch file.
 'tag' is not recognized as an internal or external command,
 operable program or batch file.
 'ltype' is not recognized as an internal or external command,
 operable program or batch file.
 'pid' is not recognized as an internal or external command,
 operable program or batch file.
 'mfgId' is not recognized as an internal or external command,
 operable program or batch file.
 'merId' is not recognized as an internal or external command,
 operable program or batch file.
 'pguid' is not recognized as an internal or external command,
 operable program or batch file.
 'destUrl' is not recognized as an internal or external command,
 operable program or batch file.

 DEBUG output created by Wget 1.11.4 on Windows-MSVC.

 --2011-04-19 11:27:09-- http://dw.com.com/redir?edId=3
 Resolving dw.com.com... seconds 0.00, 64.30.224.42
 Caching dw.com.com = 64.30.224.42
 Connecting to dw.com.com|64.30.224.42|:80... seconds 0.00, connected.
 Created socket 340.
 Releasing 0x01411158 (new refcount 1).

 ---request begin---
 GET /redir?edId=3 HTTP/1.0

 User-Agent: Wget/1.11.4

 Accept: */*

 Host: dw.com.com

 Connection: Keep-Alive



 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 302 Found

 Date: Tue, 19 Apr 2011 15:27:26 GMT

 Server: Apache/2.0

 Pragma: no-cache

 Cache-control: no-cache, must-revalidate, no-transform

 Vary: *

 Expires: Fri, 23 Jan 1970 12:12:12 GMT

 Set-Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg; expires=Sun, 18-Apr-2021
 15:27:26 GMT; domain=.com.com; path=/

 Location: http://dw.com.com/redir/redx/?edId=3

 Content-Length: 0

 P3P: CP=CAO DSP COR CURa ADMa DEVa PSAa PSDa IVAi IVDi CONi OUR OTRi
 IND PHY ONL UNI FIN COM NAV INT DEM STA

 Keep-Alive: timeout=363, max=760

 Connection: Keep-Alive

 Content-Type: text/plain



 ---response end---
 302 Found
 Registered socket 340 for persistent reuse.
 cdm: 1 2 3 4 5 6 7 8
 Stored cookie com.com -1 (ANY) / permanent insecure [expiry
 2021-04-18 11:27:26] XCLGFbrowser Cg5iVk2tqd6J8Sg
 Location: http://dw.com.com/redir/redx/?edId=3 [following]
 Skipping 0 bytes of body: [] done.
 --2011-04-19 11:27:09-- http://dw.com.com/redir/redx/?edId=3
 Reusing existing connection to dw.com.com:80.
 Reusing fd 340.

 ---request begin---
 GET /redir/redx/?edId=3 HTTP/1.0

 User-Agent: Wget/1.11.4

 Accept: */*

 Host: dw.com.com

 Connection: Keep-Alive

 Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg



 ---request end---
 HTTP request sent, awaiting response...
 ---response begin---
 HTTP/1.1 404 Not Found

 Date: Tue, 19 Apr 2011 15:27:26 GMT

 Server: Apache/2.0

 Content-Length: 209

 Keep-Alive: timeout=363, max=779

 Connection: Keep-Alive

 Content-Type: text/html; charset=iso-8859-1



 ---response end---
 404 Not Found
 Skipping 209 bytes of body: [!DOCTYPE HTML PUBLIC -//IETF//DTD HTML
 2.0//EN
 htmlhead
 title404 Not Found/title
 /headbody
 h1Not Found/h1
 pThe requested URL /redir/redx/ was not found on this server./p
 /body/html
 ] done.
 2011-04-19 11:27:09 ERROR 404: Not Found.



Re: [Bug-wget] --mirror sometimes ignores -np

2011-04-20 Thread Giuseppe Scrivano
Hi Mojca,

it was already reported here:

http://savannah.gnu.org/bugs/index.php?20519

On the same page you can find an explanation why it behaves this way.

Cheers,
Giuseppe



Mojca Miklavec mojca.miklavec.li...@gmail.com writes:

 Dear list,

 when I try to run
 wget -np --mirror --progress=bar -nH --cut-dirs=1 -erobots=off
 --reject=index.html* http://www.w32tex.org/docs
 the command will try to fetch files from /icons and all other folders
 despite the -np switch. The behaviour gets fixed if I use
 http://www.w32tex.org/docs/; (if I add trailing slash), but I still
 find it very weird and I think that this should not happen.

 Indeed it doesn't happen on some other websites that I tried, but I'm
 not sure about the exact recipe to reproduce the behaviour.

 Mojca



Re: [Bug-wget] [PATCH] Allow openSSL compiled without SSLv2

2011-04-11 Thread Giuseppe Scrivano
Thanks for the patch.  Committed and pushed.

Cheers,
Giuseppe



Cristian Rodríguez crrodrig...@opensuse.org writes:

 Hi:

 the attached patch adds support to an openSSL library compiled without
 SSlv2 , in which case, wget will behave like if it was using
 the GNUTLS backend, that is, doing sslv3 only.


 # Bazaar merge directive format 2 (Bazaar 0.90)
 # revision_id: cristian@linux-us4g-20110411021140-k71ctv0bcygv05mj
 # target_branch: bzr://bzr.savannah.gnu.org/wget/trunk/
 # testament_sha1: 0b8aab4ce061b99614d52e9fa063e5f604cd0124
 # timestamp: 2011-04-10 23:25:17 -0300
 # base_revision_id: gscriv...@gnu.org-20110407105651-ofq3ntt3w0h6zkq9
 # 
 # Begin patch
 === modified file 'src/openssl.c'
 --- src/openssl.c 2011-04-04 14:56:51 +
 +++ src/openssl.c 2011-04-11 02:11:40 +
 @@ -187,8 +187,10 @@
meth = SSLv23_client_method ();
break;
  case secure_protocol_sslv2:
 +#ifndef OPENSSL_NO_SSL2
meth = SSLv2_client_method ();
break;
 +#endif
  case secure_protocol_sslv3:
meth = SSLv3_client_method ();
break;

 # Begin bundle
 IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWTL7pGAAAXhfgAAQWGf/91Kl
 zgCwUANa5u9avddMdo4aIk8qeTyJMw00mU9CZM1Hqep6j1DCAZRDTGppTeEp+qaaaAaA00aA
 DQEkhoQJpplJ6eqemphD1DQyGmQNAj1JT0R6INGhoZAAElEwqeKekeSeUeFNDTRtINABo09T
 DebO3wy7zPm0CmOnswQYUnz1fe8Kyy7YsMx4fQPzKYzAYzmk0qwlXzef7myR73QL0ZRdEkMy1SVa
 p9NXxxJBkHC3NNLAdUE+ksVJCKypafCYeTue0SpBPkjoX3wWGRCFrWkaxiGMYGFzEYKgjnDD4g1G
 k5UeUVhceQcTP1WfSf1bAky2PgHS6fucUBhFq+W86/U+YrBFCVG5i181Uw2jYgLCm5LTc0alGySm
 16BMGMq9Vj+HfQpLEqOVw4dhgraqguKSqiUcxPKIzW4E8DgGXNv/T0mIYDADSFlZNMKi1Mpij+IB
 gxHYx8ch2kzzHgB9ejROs4c5QoXzFF5Yq8ImjQlkysbdcclNy1ysIRIFsM4Swta12Ly7GAdHeo0W
 hUNRfboKBL5qqNAtxndeDZcFYtz7FW7bGFbZBCd8CcE02kVClGTPxGuSHhqSYrx5eoCiJUJgzPJH
 RmzAdTL8XzV17XLIN7nwN7YRhRWwNmyQsGhl590SOsNs0zQJNoeUqMlw98d+3eLOBFp44P4bELIY
 lqiDgxgjF+DBRSdBnJcR8hcDXu4wHA1BzJS6qKZCIdFx1zQ4Z7tryobFDG2iuQaL/Cvwx2pR3zZ0
 TYb/GWXeaIlANz0DGh9MchtfC7KSI9HVAfP2QXkKSMmOjtVVrqD4QFkmBv9bTO4TPGwP+aQhg3V7
 U2IGqxMrDpS60fUitWQQC5bhZzjTOcL3wmbSDmk7AOxtO73TH6VqxtXg85tYZsopYVOzhHEKQOTt
 aD5ZBQsKhZtPQ6sMVMrQYE7QgpsEtFNj46FfUo1wM+Qut3OacZJksYYxjIOEHjODhICTLsWAREUa
 A2yIp1ATNUd95poeB7ANLzSVcu5zKDMkCAgt8EJdifXVaKIfDa6K8cCYclZ1WgpLdHF5XLuUFHXJ
 Ks+IPs4MXVBpqi6LfTareKvhaBcjOV3VS9PD+DtY+hQGtrhUDjeXxC8JMKmgoBFbd+tNEu++dUsw
 Yg+ogBmuMwOcymNMjQlhysOQMimA1m//F3JFOFCQMvukYA==



Re: [Bug-wget] Wget segfaults on malformed HTTP status line

2011-04-09 Thread Giuseppe Scrivano
thanks for the bug report, it is already fixed in the development
version.  The fix will be included in the next wget release.

Cheers,
Giuseppe



Vitaly Minko vitaly.mi...@gmail.com writes:

 Hi all,

 I get segmentation fault when HTTP server returns malformed status line
 (without a status code). Use the following command to reproduce the issue:
 `wget vminko.org:8081/test`
 Wget crashes because the HTTP daemon returns just HTTP/1.0\n\n (see
 wget-test.pl).
 The proposed fix is attached (wget-1.12-http-status-line.patch).

 Best regards,
 Vitaly



Re: [Bug-wget] How do I tell wget not to follow links in a file?

2011-04-07 Thread Giuseppe Scrivano
David Skalinder da...@skalinder.net writes:

 I want to mirror part of a website that contains two links pages, each of
 which contains links to many root-level directories and also to the other
 links page.  I want to download recursively all the links from one links
 page, but not from the other: that is, I want to tell wget download
 links1 and follow all of its links, but do not download or follow links
 from links2.

 I've put a demo of this problem up at http://fangjaw.com/wgettest -- there
 is a diagram there that might state the problem more clearly.

 This functionality seems so basic that I assume I must be overlooking
 something.  Clearly wget has been designed to give users control over
 which files they download; but all I can find is that -X controls both
 saving and link-following at the directory level, while -R controls saving
 at the file level but still follows links from unsaved files.

why doesn't -X work in the scenario you have described?  If all links
from `links2' are under /B, you can exclude them using something like:

wget -r -Xwgettest/B http://fangjaw.com/wgettest

Cheers,
Giuseppe



Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2

2011-04-04 Thread Giuseppe Scrivano
Ray Satiro raysat...@yahoo.com writes:

 Hi,

 It is still an issue that wget/openssl combo is broken in windows.

I have uploaded a new tarball:

  ftp://alpha.gnu.org/gnu/wget/wget-1.12-2474.tar.bz2

Can you please check if it works well for you now?  OpenSSL should work
well now under Windows, but I am not sure about the configure stuff.

Thanks,
Giuseppe



Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2

2011-04-03 Thread Giuseppe Scrivano
Ray Satiro raysat...@yahoo.com writes:

 Anything in OpenSSL that tries to write to a socket will fail because it's 
 passed a fd and not a socket. For example sock_write() in openssl's 
 crypto/bio/bss_sock.c:153 calling send() and passing a fd will cause an error 
 of 
 WSAENOTSOCK.

It shouldn't happen.  If you look at openssl.c:401, we register the
socket on Windows, not the fd.  I am just guessing it should work but I
don't have a Windows machine where I can check it by myself.



 Another thing is the configure test for openssl is still using ssl and crypto 
 libs
 configure:22076: gcc -o conftest.exe  -O2 -Wall   conftest.c  -lssl -lcrypto 
 5
 but on windows you want
 -lssl -lcrypto -lws2_32 -lgdi32 
 As I mentioned at some other point in time what you'd expect is shared libs 
 when 
 building. Unfortunately a similar test for that will fail if the actual dll 
 is 
 not in the path. Would it be better for just an AC_CHECK_LIB on eay32 and 
 ssl32?

I have pushed some patches to do it.  Can you please try with the
development version if something is improved?

I have cross compiled to mingw without problems, I have obtained OpenSSL
for mingw using mingw-cross-env[1], which saved me from the burden of
cross-compiling it.



 Another thing re ipv6 support:
 host.c: In function 'getaddrinfo_with_timeout_callback':
 host.c:383:3: warning: implicit declaration of function 'getaddrinfo'
 host.c: In function 'lookup_host':
 host.c:787:5: warning: implicit declaration of function 'freeaddrinfo'

 In windows ws2tcpip.h should be included in addition to winsock2.h. Some 
 headers 
 for ws2tcpip.h have the winsock2.h include some don't. The order is 
 # include winsock2.h
 # include ws2tcpip.h

 When ipv6 is enabled _WIN32_WINNT should be defined = 0x0501 (WinXP) before 
 includes. This means wget with ipv6 will not work on win2000. There's a 
 solution 
 for this but it requires rewriting code that is copyrighted microsoft for a 
 getaddrinfo wrapper, unless someone has already done this. Is windows 2000 
 support still wanted? I have one request from last year but other than that I 
 don't hear about it anymore.

I think the gnulib getaddrinfo does it.

Have you tried the gnutls version of wget?  Does it work for you?

Thanks,
Giuseppe

1) http://mingw-cross-env.nongnu.org/



Re: [Bug-wget] mirroring one sourceforge package?

2011-03-30 Thread Giuseppe Scrivano
Micah Cowan mi...@cowan.name writes:

 So it looks like wget is correctly blocking the http URL, but
 incorrectly permitting the https URL.

We check if the two schemes are similar but at the same time we require
the port to be identical.

I have relaxed this condition, now the two ports must be identical only
in the case the same protocol is used.

I have pushed this patch:

=== modified file 'src/recur.c'
--- src/recur.c 2011-01-01 12:19:37 +
+++ src/recur.c 2011-03-30 23:36:05 +
@@ -563,7 +563,8 @@
   if (opt.no_parent
schemes_are_similar_p (u-scheme, start_url_parsed-scheme)
0 == strcasecmp (u-host, start_url_parsed-host)
-   u-port == start_url_parsed-port
+   (u-scheme != start_url_parsed-scheme
+  || u-port == start_url_parsed-port)
!(opt.page_requisites  upos-link_inline_p))
 {
   if (!subdir_p (start_url_parsed-dir, u-dir))

Applying it and launching wget using the same arguments used by Karl, I
get:

$ find sourceforge.net/ -maxdepth 3
sourceforge.net/
sourceforge.net/projects
sourceforge.net/projects/biblatex-biber
sourceforge.net/projects/biblatex-biber/files
sourceforge.net/robots.txt

Just in time before the release :-)

Cheers,
Giuseppe



Re: [Bug-wget] Re: Maintainer needs updating in man page

2011-03-21 Thread Giuseppe Scrivano
Micah Cowan mi...@cowan.name writes:

 Since the manpage is automatically generated from the info manual, this
 needs to be fixed in wget.texinfo, too.

thanks, I am going to fix it in the documentation too.

Giuseppe



Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2

2011-03-18 Thread Giuseppe Scrivano
Steven M. Schweda s...@antinode.info writes:

I know that all the serious folks in the world have all the GNU
 infrastructure in place, but wouldn't a clever repository-access system
 be able to grind out a ready-to-use distribution kit upon user request? 
 Just a thought.

we make a difference between people who use the source tarball and
developers who do a checkout from the source repository.  The latter
need some additional programs.  The bootstrap scripts ensure the gnulib
files are always updated without care if they were updated in our
repository (we don't really want to care about it) and most important it
avoids to duplicate the same file over different repositories.  I think
these advantages worth the additional costs introduced by the bootstrap
procedure.

Cheers,
Giuseppe



[Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2

2011-03-16 Thread Giuseppe Scrivano
Hello,

I have prepared a new alpha release containing the last changes:

ftp://alpha.gnu.org/gnu/wget/wget-1.12-2460.tar.bz2

To verify it, here the detached GPG signature using the key C03363F4:

ftp://alpha.gnu.org/gnu/wget/wget-1.12-2460.tar.bz2.sig



Hopefully the next release is close now.

Please report any problem you may experience using it.

Thanks,
Giuseppe



Re: [Bug-wget] wget 1.11.4 windows compile

2011-03-15 Thread Giuseppe Scrivano
Hello Ethan,

can you please try again using the last development version?

You can fetch it from the Bazaar repository how explained here:

  https://savannah.gnu.org/bzr/?group=wget

The branch is trunk.

Thanks,
Giuseppe



Ethan Zheng legen...@hotmail.com writes:

 Absolutely newbie,
 Could not compile 1.11.4 but 1.10 compiled, compile without SSL
 But I have GnuWin32/wget precompied working on my system.
 Being curious why I am not able to compile 1.11.4 my self on XP (also Win7) 
 MSVC pro 2005.
 Thanks,

 When try to build 1.11.4 nmake, it complains: fatal error C1083: Cannot open 
 include file: 'windows/config-compiler.h': No such file or directory
 Manually add /I.. to src/Makfile CFLAGS got me pass that path issue.
 Then errors in compiling init.c
 c:\wgetnmake
 Microsoft (R) Program Maintenance Utility Version 8.00.50727.42Copyright (C) 
 Microsoft Corporation.  All rights reserved.
 cd srcC:\Program Files\Microsoft Visual Studio 
 8\VC\BIN\nmake.exe
 Microsoft (R) Program Maintenance Utility Version 8.00.50727.42Copyright (C) 
 Microsoft Corporation.  All rights reserved.
 cl /nologo /MT /O2 /I. /I.. /DWINDOWS /D_CONSOLE
 /DHAVE_CONFIG_H /c init.cinit.cinit.c(61) : error C2061: syntax error
 : identifier 'relocate'init.c(61) : error C2059: syntax error :
 ;'init.c(72) : error C2085: 'enable_tilde_expansion' : not in formal
 parameter listinit.c(77) : error C2085: 'cmd_boolean' : not in formal
 parameter listinit.c(78) : error C2085: 'cmd_bytes' : not in formal
 parameter listinit.c(79) : error C2085: 'cmd_bytes_sum' : not in
 formal parameter listinit.c(83) : error C2085: 'cmd_directory_vector'
 : not in formal parameter listinit.c(84) : error C2085: 'cmd_number' :
 not in formal parameter listinit.c(85) : error C2085: 'cmd_number_inf'
 : not in formal parameter listinit.c(86) : error C2085: 'cmd_string' :
 not in formal parameter listinit.c(87) : error C2085: 'cmd_file' : not
 in formal parameter listinit.c(88) : error C2085: 'cmd_directory' :
 not in formal parameter listinit.c(89) : error C2085: 'cmd_time' : not
 in formal parameter listinit.c(90) : error C2085: 'cmd_vector' : not
 in formal parameter listinit.c(92) : error C2085: 'cmd_spec_dirstruct'
 : not in formal parameter listinit.c(93) : error C2085:
 cmd_spec_header' : not in formal parameter listinit.c(94) : error
 C2085: 'cmd_spec_htmlify' : not in formal parameter listinit.c(95) :
 error C2085: 'cmd_spec_mirror' : not in formal parameter
 listinit.c(96) : error C2085: 'cmd_spec_prefer_family' : not in formal
 parameter listinit.c(97) : error C2085: 'cmd_spec_progress' : not in
 formal parameter listinit.c(98) : error C2085: 'cmd_spec_recursive' :
 not in formal parameter listinit.c(99) : error C2085:
 cmd_spec_restrict_file_names' : not in formal parameter
 listinit.c(103) : error C2085: 'cmd_spec_timeout' : not in formal
 parameter listinit.c(104) : error C2085: 'cmd_spec_useragent' : not in
 formal parameter listinit.c(105) : error C2085: 'cmd_spec_verbose' :
 not in formal parameter listinit.c(118) : error C2085: 'commands' :
 not in formal parameter listinit.c(118) : error C2143: syntax error :
 missing ';' before '='init.c(268) : error C2065: 'commands' :
 undeclared identifierinit.c(268) : error C2109: subscript requires
 array or pointer typeinit.c(273) : error C2109: subscript requires
 array or pointer typeinit.c(273) : error C2198: 'stricmp' : too few
 arguments for callinit.c(463) : error C2065: 'enable_tilde_expansion'
 : undeclared identifierinit.c(532) : error C2065: 'syswgetrc' :
 undeclared identifierinit.c(533) : warning C4022: 'free' : pointer
 mismatch for actual parameter 1init.c(653) : error C2109: subscript
 requires array or pointer typeinit.c(654) : error C2109: subscript
 requires array or pointer typeinit.c(655) : error C2109: subscript
 requires array or pointer typeinit.c(655) : error C2109: subscript
 requires array or pointer typeinit.c(655) : warning C4033:
 setval_internal' must return a valueNMAKE : fatal error U1077:
 C:\Program Files\Microsoft Visual Studio 8\VC\BIN\cl.EXE' : return
 code '0x2'Stop.NMAKE : fatal error U1077: 'C:\Program Files\Microsoft
 Visual Studio 8\VC\BIN\nmake.exe' : returncode '0x2'Stop.

 



Re: [Bug-wget] some memory leaks in wget-1.12 release source

2011-03-11 Thread Giuseppe Scrivano
Hi Zhenbo,

thanks to have reported them.  I have committed a patch (commit #2460)
which should fix these memory leaks.

Cheers,
Giuseppe



Zhenbo Xu zhenbo1...@gmail.com writes:

 Hi,everybody!
 I found some memory leaks in wget-1.12 source codes.The following lists
 the bugs:

 bug 1:
 File:ftp-ls.c
 Location: line 456
 Description:
 In function ftp_parse_winnt_ls, 
 ...
 while ((line =read_whole_line(fp)) != NULL) {
   len = clean_line(line)
   if (len  40)continue;  //Leak occured here, line is not released.
   ...
   ...
 }

 bug 2:
 File : ftp.c
 Location: line 304
 Description:
 in function getftp(), 

 getftp(...) {
 ...
 ...
 if (con-proxy) {
 logname = concat_strings(...);   //line 295, allocated a heap
 region to logname
 }
 ...
 csock = connect_to_host (host, port);
 if (csock == E_HOST)
   return HOSTERR;  //return without free(logname)
 ...
 ...
 }

 I'm glad to get your replies if these are the real bugs .

 Best Wishes!

 --

  from Zhenbo Xu



Re: [Bug-wget] Use stderr instead of stdout for --ask-password

2011-02-24 Thread Giuseppe Scrivano
Micah Cowan mi...@cowan.name writes:

 Changing the prompt to stderr seems like a simple, single step forward
 towards proper usage. It's not perfect, but it strikes me as a good
 sight better than using stdout, which really ought to be reserved for
 program results-type output, IMO.

I have applied the original patch, which prompt to stderr instead of
stdout.  I agree it is not the ideal usage, but the current decision is
about use stderr or inhibit the message at all; considering the
diagnostic nature of stderr, then the former seems a better choice.

Thanks,
Giuseppe



Re: [Bug-wget] Use stderr instead of stdout for --ask-password

2011-02-23 Thread Giuseppe Scrivano
Hello Gilles,

thanks for your patch.  I am not sure it is a good idea to use stderr
to prompt a message to the user.  I would just inhibit the message when
-O- is used.

Cheers,
Giuseppe



Gilles Carry gilles.ca...@st.com writes:

 Hello,

 Here is a small patch to change the ask-password behaviour.
 You may find the explanation in patch's changelog.
 I confess I did not test much this patch.

 Best regards,
 Thank-you,
 Gilles.

 diff --git a/src/ChangeLog b/src/ChangeLog
 index f37814d..b9bf2d7 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,13 @@
 +2011-02-22  Gilles Carry  gilles dot carry at st dot com
 +
 + * main.c (prompt_for_password): Use stderr instead of stdout
 + to prompt password. This allows to use --output-document=- and
 + --ask-password simultaneously. Without this, redirecting stdout
 + makes password prompt invisible and mucks up payload such as in
 + this example:
 + wget --output-document=- --ask-password -user=foo \
 + http://foo.com/tarball.tgz | tar zxf -
 +
  2009-09-22  Micah Cowan  mi...@cowan.name
  
   * openssl.c (ssl_check_certificate): Avoid reusing the same buffer
 diff --git a/src/main.c b/src/main.c
 index dddc4b2..db1638f 100644
 --- a/src/main.c
 +++ b/src/main.c
 @@ -725,9 +725,9 @@ static char *
  prompt_for_password (void)
  {
if (opt.user)
 -printf (_(Password for user %s: ), quote (opt.user));
 +fprintf (stderr, _(Password for user %s: ), quote (opt.user));
else
 -printf (_(Password: ));
 +fprintf (stderr, _(Password: ));
return getpass();
  }
  



Re: [Bug-wget] [PATCH] Move duplicated code in http.c to a function

2011-02-23 Thread Giuseppe Scrivano
Thanks for your contribution.  I have just applied your patch.

Giuseppe



Steven Schubiger s...@member.fsf.org writes:

 Patch attached. 


 === modified file 'src/ChangeLog'
 --- src/ChangeLog 2010-12-10 22:55:54 +
 +++ src/ChangeLog 2011-02-22 12:43:23 +
 @@ -1,3 +1,9 @@
 +2011-02-22  Steven Schubiger  s...@member.fsf.org
 +
 + * http.c (gethttp, http_loop): Move duplicated code which is run
 + when an existing file is not to be clobbered to a function.
 + (get_file_flags): New static function.
 +
  2010-12-10  Evgeniy Philippov egphilip...@googlemail.com (tiny change)
  
   * main.c (main): Initialize `total_downloaded_bytes'.

 === modified file 'src/http.c'
 --- src/http.c2011-01-01 12:19:37 +
 +++ src/http.c2011-02-18 18:56:57 +
 @@ -1448,6 +1448,20 @@
hs-error = NULL;
  }
  
 +static void
 +get_file_flags (const char *filename, int *dt)
 +{
 +  logprintf (LOG_VERBOSE, _(\
 +File %s already there; not retrieving.\n\n), quote (filename));
 +  /* If the file is there, we suppose it's retrieved OK.  */
 +  *dt |= RETROKF;
 +
 +  /*  Bogusness alert.  */
 +  /* If its suffix is html or htm or similar, assume text/html.  */
 +  if (has_html_suffix_p (filename))
 +*dt |= TEXTHTML;
 +}
 +
  #define BEGINS_WITH(line, string_constant)   \
(!strncasecmp (line, string_constant, sizeof (string_constant) - 1)\
  (c_isspace (line[sizeof (string_constant) - 1])  \
 @@ -2158,16 +2172,7 @@
/* If opt.noclobber is turned on and file already exists, do not
   retrieve the file. But if the output_document was given, then 
 this
   test was already done and the file didn't exist. Hence the 
 !opt.output_document */
 -  logprintf (LOG_VERBOSE, _(\
 -File %s already there; not retrieving.\n\n), quote (hs-local_file));
 -  /* If the file is there, we suppose it's retrieved OK.  */
 -  *dt |= RETROKF;
 -
 -  /*  Bogusness alert.  */
 -  /* If its suffix is html or htm or similar, assume text/html.  
 */
 -  if (has_html_suffix_p (hs-local_file))
 -*dt |= TEXTHTML;
 -
 +  get_file_flags (hs-local_file, dt);
xfree (head);
xfree_null (message);
return RETRUNNEEDED;
 @@ -2639,24 +2644,12 @@
got_name = true;
  }
  
 -  /* TODO: Ick! This code is now in both gethttp and http_loop, and is
 -   * screaming for some refactoring. */
if (got_name  file_exists_p (hstat.local_file)  opt.noclobber  
 !opt.output_document)
  {
/* If opt.noclobber is turned on and file already exists, do not
   retrieve the file. But if the output_document was given, then this
   test was already done and the file didn't exist. Hence the 
 !opt.output_document */
 -  logprintf (LOG_VERBOSE, _(\
 -File %s already there; not retrieving.\n\n),
 - quote (hstat.local_file));
 -  /* If the file is there, we suppose it's retrieved OK.  */
 -  *dt |= RETROKF;
 -
 -  /*  Bogusness alert.  */
 -  /* If its suffix is html or htm or similar, assume text/html.  */
 -  if (has_html_suffix_p (hstat.local_file))
 -*dt |= TEXTHTML;
 -
 +  get_file_flags (hstat.local_file, dt);
ret = RETROK;
goto exit;
  }



<    1   2   3   4   5   6   7   >