Re: [Bug-wget] Fix: Large files in WARC
Ángel González keis...@gmail.com writes: You would also need #define _FILE_OFFSET_BITS 64 but that seems already handled by configure. I'm not sure if that would work for 32bit Windows, though. _FILE_OFFSET_BITS is defined by the AC_SYS_LARGEFILE macro in the configure.ac file so we haven't to worry about it. Cheers, Giuseppe
Re: [Bug-wget] Two fixes: Memory leak with chunked responses / Chunked responses and WARC files
hey, thanks for your patches. I have pushed them. Cheers, Giuseppe Gijs van Tulder gvtul...@gmail.com writes: Hi, Here are two small patches. I hope they will be useful. First, a patch that fixes a memory leak in fd_read_body (src/retr.c) and skip_short_body (src/http.c) when it retrieves a response with Transfer-Encoding: chunked. Both functions make calls to fd_read_line but never free the result. Second, a patch to the fd_read_body function that changes the way chunked responses are saved in the WARC file. Until now, wget would write a de-chunked response to the WARC file, which is wrong: the WARC file is supposed to have an exact copy of the HTTP response, so it should also include the chunk headers. The first patch fixes the memory leaks. The second patch changes fd_read_body to save the full, chunked response in the WARC file. Regards, Gijs
Re: [Bug-wget] Fwd: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits
hi, I will take a deeper look after your copyright assignment process is completed. As a suggestion for the futur: it will be better if you ask on the mailing list before start working on a task next time (unless it is a bug). Not all new feature requests can be accepted into wget. Cheers, Giuseppe Sasikanth sasikanth@gmail.com writes: Modified the calc_rate function to calculate bandwidth in powers of ten (SI-prefix format) for --bits option. Please review the changes Thanks Sasi -- Forwarded message -- From: Sasikanth sasikanth@gmail.com Date: Wed, Jan 18, 2012 at 5:43 PM Subject: Re: [Bug-wget] [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits To: Hrvoje Niksic hnik...@xemacs.org Cc: bug-wget@gnu.org On Sun, Jan 15, 2012 at 8:51 PM, Hrvoje Niksic hnik...@xemacs.org wrote: Sasikanth sasikanth@gmail.com writes: No one asked. i had just thought it will be good to display all the output in either bits or bytes to avoid confusion to the user (I had confused). I understand that, but I have never seen a downloading agent output data length in bits, so displaying the data in bits would likely cause much more confusion and/or be less useful. (Data throughput in bits, on the other hand, is quite common.) With the original implementation of --bits I expect that someone would soon ask for --bits-for-bandwidth-only. Anyhow thanks I will modify the patch. Thanks. Note that the patch has another problem: while Wget's K, M, and G refer to (what is now known as) kibibytes, mebibytes, and gibibytes, bandwidth is measured in kilobits, megabits, and gigabits per second. Bandwidth units all refer to powers of ten, not to powers of two, so it is incorrect for calc_rate to simply increase the byte multipliers by 8. Hrvoje Modified the calc_rate function to calculate bandwidth in powers of ten (SI-prefix format) for --bits option. Please review the changes Thanks Sasi diff -ur orig/wget-1.13.4/src/init.c wget-1.13.4/src/init.c --- orig/wget-1.13.4/src/init.c 2011-08-19 15:36:20.0 +0530 +++ wget-1.13.4/src/init.c2012-01-18 14:42:56.240973950 +0530 @@ -126,6 +126,7 @@ { backups, opt.backups, cmd_number }, { base, opt.base_href, cmd_string }, { bindaddress, opt.bind_address, cmd_string }, + { bits, opt.bits_fmt, cmd_boolean}, #ifdef HAVE_SSL { cacertificate,opt.ca_cert, cmd_file }, #endif diff -ur orig/wget-1.13.4/src/main.c wget-1.13.4/src/main.c --- orig/wget-1.13.4/src/main.c 2011-09-06 19:20:11.0 +0530 +++ wget-1.13.4/src/main.c2012-01-18 14:42:56.241973599 +0530 @@ -166,6 +166,7 @@ { backups, 0, OPT_BOOLEAN, backups, -1 }, { base, 'B', OPT_VALUE, base, -1 }, { bind-address, 0, OPT_VALUE, bindaddress, -1 }, +{ bits, 0, OPT_BOOLEAN, bits, -1 }, { IF_SSL (ca-certificate), 0, OPT_VALUE, cacertificate, -1 }, { IF_SSL (ca-directory), 0, OPT_VALUE, cadirectory, -1 }, { cache, 0, OPT_BOOLEAN, cache, -1 }, @@ -704,6 +705,11 @@ -np, --no-parent don't ascend to the parent directory.\n), \n, +N_(\ +Output format:\n), +N_(\ + --bits Output bandwidth in bits.\n), +\n, N_(Mail bug reports and suggestions to bug-wget@gnu.org.\n) }; diff -ur orig/wget-1.13.4/src/options.h wget-1.13.4/src/options.h --- orig/wget-1.13.4/src/options.h2011-08-06 15:54:32.0 +0530 +++ wget-1.13.4/src/options.h 2012-01-18 14:42:56.247982676 +0530 @@ -255,6 +255,7 @@ bool show_all_dns_entries; /* Show all the DNS entries when resolving a name. */ + bool bits_fmt; /*Output bandwidth in bits format*/ }; extern struct options opt; diff -ur orig/wget-1.13.4/src/progress.c wget-1.13.4/src/progress.c --- orig/wget-1.13.4/src/progress.c 2011-01-01 17:42:35.0 +0530 +++ wget-1.13.4/src/progress.c2012-01-18 14:42:56.249098685 +0530 @@ -861,7 +861,7 @@ struct bar_progress_hist *hist = bp-hist; /* The progress bar should look like this: - xx% [=== ] nn,nnn 12.34K/s eta 36m 51s + xx% [=== ] nn,nnn 12.34KB/s eta 36m 51s Calculate the geometry. The idea is to assign as much room as possible to the progress bar. The other idea is to never let @@ -873,7 +873,7 @@ xx% or 100% - percentage - 4 chars [] - progress bar decorations - 2 chars nnn,nnn,nnn- downloaded bytes - 12 chars or very rarely more - 12.5K/s- download rate - 8 chars + 12.5KB/s- download rate - 9 chars eta 36m 51s -
Re: [Bug-wget] Cannot compile current bzr trunk: undefined reference to `gzwrite' / `gzclose' / `gzdopen'
Gijs van Tulder gvtul...@gmail.com writes: Hi all, The attached patch should hopefully fix Evgenii's problem. The patch changes the configure script to always use libz, unless it is explicitly disabled. In that case, the patch makes sure that the WARC functions do not use gzip but write to uncompressed files instead. Thanks for the contribution, the patch looks correct, I am going to apply it. Cheers, Giuseppe
Re: [Bug-wget] Fwd: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits
Thanks for the patch, except some minor esthetic changes, like an empty space between the function name and '(', that I can fix before apply it, the patch seems ok. Before I can apply it though, you need to get copyright assignments with the FSF. I am going to send more information in private to you. Cheers, Giuseppe Sasikanth sasikanth@gmail.com writes: Sorry guys In my previous mail I attached .patch extension file instead of .txt extension. Now correctly attached Thanks Sasi -- Forwarded message -- From: Sasikanth sasikanth@gmail.com Date: Wed, Jan 11, 2012 at 3:18 PM Subject: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits To: bug-wget@gnu.org Hi all, I added a new option --bits as requested in https://savannah.gnu.org/bugs/?33210. This patch will display all data length in bits format for --bits option. I had verified it with http and ftp. Please let me know If I missed out anything. Attachments: patch and change log entry file Thanks Sasi diff -ru orig/wget-1.13.4/src/ftp.c wget-1.13.4/src/ftp.c --- orig/wget-1.13.4/src/ftp.c2012-01-09 14:06:31.273731044 +0530 +++ wget-1.13.4/src/ftp.c 2012-01-11 14:05:33.793990983 +0530 @@ -217,18 +217,18 @@ static void print_length (wgint size, wgint start, bool authoritative) { - logprintf (LOG_VERBOSE, _(Length: %s), number_to_static_string (size)); + logprintf (LOG_VERBOSE, _(Length: %s), number_to_static_string (convert_to_bits(size))); if (size = 1024) -logprintf (LOG_VERBOSE, (%s), human_readable (size)); +logprintf (LOG_VERBOSE, (%s), human_readable (convert_to_bits(size))); if (start 0) { if (size - start = 1024) logprintf (LOG_VERBOSE, _(, %s (%s) remaining), - number_to_static_string (size - start), - human_readable (size - start)); + number_to_static_string (convert_to_bits(size - start)), + human_readable (convert_to_bits(size - start))); else logprintf (LOG_VERBOSE, _(, %s remaining), - number_to_static_string (size - start)); + number_to_static_string (convert_to_bits(size - start))); } logputs (LOG_VERBOSE, !authoritative ? _( (unauthoritative)\n) : \n); } @@ -1564,7 +1564,7 @@ : _(%s (%s) - %s saved [%s]\n\n), tms, tmrate, write_to_stdout ? : quote (locf), - number_to_static_string (qtyread)); + number_to_static_string (convert_to_bits(qtyread))); } if (!opt.verbose !opt.quiet) { @@ -1573,7 +1573,7 @@ time. */ char *hurl = url_string (u, URL_AUTH_HIDE_PASSWD); logprintf (LOG_NONVERBOSE, %s URL: %s [%s] - \%s\ [%d]\n, - tms, hurl, number_to_static_string (qtyread), locf, count); + tms, hurl, number_to_static_string (convert_to_bits(qtyread)), locf, count); xfree (hurl); } @@ -1792,7 +1792,7 @@ /* Sizes do not match */ logprintf (LOG_VERBOSE, _(\ The sizes do not match (local %s) -- retrieving.\n\n), - number_to_static_string (local_size)); + number_to_static_string (convert_to_bits(local_size))); } } } /* opt.timestamping f-type == FT_PLAINFILE */ @@ -2206,7 +2206,7 @@ sz = -1; logprintf (LOG_NOTQUIET, _(Wrote HTML-ized index to %s [%s].\n), - quote (filename), number_to_static_string (sz)); + quote (filename), number_to_static_string (convert_to_bits(sz))); } else logprintf (LOG_NOTQUIET, diff -ru orig/wget-1.13.4/src/http.c wget-1.13.4/src/http.c --- orig/wget-1.13.4/src/http.c 2012-01-09 14:06:31.274730346 +0530 +++ wget-1.13.4/src/http.c2012-01-11 14:24:02.721099726 +0530 @@ -2423,19 +2423,19 @@ logputs (LOG_VERBOSE, _(Length: )); if (contlen != -1) { - logputs (LOG_VERBOSE, number_to_static_string (contlen + contrange)); + logputs (LOG_VERBOSE, number_to_static_string (convert_to_bits (contlen) + contrange)); if (contlen + contrange = 1024) logprintf (LOG_VERBOSE, (%s), - human_readable (contlen + contrange)); + human_readable (convert_to_bits(contlen) + contrange)); if (contrange) { if (contlen = 1024) logprintf (LOG_VERBOSE, _(, %s (%s) remaining), -
Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not formatted..
Micah Cowan mi...@micah.cowan.name writes: I believe hh's suggestion is to have the format reflect the way it would look in a URL; so [ and ] around ipv6, and nothing around ipv4 (since ipv4 format isn't ambiguous in the way ipv6 is). I agree. Please rework your patch to use [address]:port just for IPv6. This message should be fixed as well Reusing existing connection to ADDRESS:IP.. Please also provide a ChangeLog file entry. Thanks for your contribution! Giuseppe
Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not formatted..
thanks. The patch is not complete yet, it doesn't fix the other message I have reported before. Can you please check it as well? Can you provide a ChangeLog file entry? Cheers, Giuseppe Sasikanth sasikanth@gmail.com writes: I had modified the patch as you guys suggested. For ipv6 the display will be [ipv6address]:port for ipv4 ipv4address:port The test results IPv4 --- [root@Shash wget-1.13.4]# ./src/wget http://10.0.0.1 --2012-01-07 11:01:23-- http://10.0.0.1/ Connecting to 10.0.0.1:80... IPv6 --- [root@Shash wget-1.13.4]# ./src/wget http://[3ffe:b80:17e2::1] --2012-01-07 11:01:06-- http://[3ffe:b80:17e2::1]/ Connecting to [3ffe:b80:17e2::1]:80 Thanks Sasi On Sat, Jan 7, 2012 at 3:14 AM, Henrik Holst henrik.ho...@millistream.comwrote: Exactly! That is how atleast I have akways seen address and port combinations been presented (or entered). /hh Den 6 jan 2012 21:27 skrev Micah Cowan mi...@micah.cowan.name: I believe hh's suggestion is to have the format reflect the way it would look in a URL; so [ and ] around ipv6, and nothing around ipv4 (since ipv4 format isn't ambiguous in the way ipv6 is). (Sent by my Kindle Fire) -mjc Sent from my Kindle Fire -- *From:* Sasikanth sasikanth@gmail.com *Sent:* Fri Jan 06 01:56:34 PST 2012 *To:* henrik.ho...@millistream.com *Cc:* bug-wget@gnu.org *Subject:* Re: [Bug-wget] [PATCH] [wget-bug #32357], IPv6 addresses not formatted.. Currently we are not checking family type of the address before printing the message. Do we have to print the message as [3ffe:b80:17e2::1]:80 for ipv6 and |10.0.0.1|:80 for ipv4? Please confirm so I will resubmit patch. Thanks Sasi Note: I didn't get the reply to my mail, to keep track the discussion I had copied the mail content from the mailing list. Shouldn't IPv6 addresses be displayed like this instead: [3ffe:b80:17e2::1]:80 /hh Den 5 jan 2012 14:15 skrev Sasikanth address@hidden: Hi, This very small change related to display issue. The bug id is 32357 https://savannah.gnu.org/bugs/index.php?32357;; When we run wget with an ip address alone (wget 10.0.0.1 or wget http://10.0.0.1/ or wget http://[3ffe:b80:17e2::1]) the display shows as IPV4 Connecting to 10.0.0.1:80... IPV6 Connecting to 3ffe:b80:17e2::1:80 (Because of IPV6 format (ff::01) it is little hard differentiate the ipv6 address and port number) This patch will show the display IPV4 Connecting to |10.0.0.1|:80... IPV6 Connecting to |3ffe:b80:17e2::1|:80 Thanks Sasi --- src/connect.c.orig2012-01-07 09:39:55.965324001 +0530 +++ src/connect.c 2012-01-07 10:54:08.295324000 +0530 @@ -293,7 +293,12 @@ xfree (str); } else -logprintf (LOG_VERBOSE, _(Connecting to %s:%d... ), txt_addr, port); + { + if (ip-family == AF_INET) + logprintf (LOG_VERBOSE, _(Connecting to %s:%d... ), txt_addr, port); + else if (ip-family == AF_INET6) + logprintf (LOG_VERBOSE, _(Connecting to [%s]:%d... ), txt_addr, port); + } } /* Store the sockaddr info to SA. */
Re: [Bug-wget] feature suggestion: host spanning depth limit (absolute)
Naxa anaxagra...@gmail.com writes: I suggest a feature for limiting the recursion depth level specifically on different Hosts, when spanning hosts. This way I wouldn't need to know and list the different hosts when, for example, a page links to multiple image hosting sites. An option like `-H 1` would then limit host spanning the same way that `--limit` works for all recursion. Would count the needed spanning steps from the original domain as the distance. it is something that can be implemented without change the current semantic of -H. Feel free to submit a patch :-) Thanks, Giuseppe
Re: [Bug-wget] Wget 1.13.4 test suite on Windows/MinGW
Eli Zaretskii e...@gnu.org writes: Sorry, I don't understand this comment. fd is indeed a file descriptor, but ioctlsocket's first argument is a SOCKET object, which is an unsigned int, and we get it from a call to `socket' or some such. So where do you see a potential problem? And anyway, I think wget calls ioctlsocket for every connection; if so, then most of those calls succeed, because the binary I built works and is quite capably of fetching via HTTP. So these problems seem to be triggered by something specific in those 3 tests. sorry I wasn't clear. We use gnulib replacements for socket functions so internally wget knows only about file descriptors. On Windows this abstraction is obtained trough _open_osfhandle on a SOCKET object. When we use a native function, like ioctlsocket, we have to be sure the file descriptor is converted back to a SOCKET object (by using _get_osfhandle). I am afraid this conversion is not done correctly, the value you have observed (fd = 3) let me think so. The w32sock.h file from gnulib defines these two macros for such conversions: #define FD_TO_SOCKET(fd) ((SOCKET) _get_osfhandle ((fd))) #define SOCKET_TO_FD(fh) (_open_osfhandle ((long) (fh), O_RDWR | O_BINARY)) Cheers, Giuseppe
Re: [Bug-wget] empty VERSION in 1.13.4
Elan Ruusamäe g...@pld-linux.org writes: hi i reported on irc, but apparently nobody listens there: Day changed to 09 Dec 2011 22:07:16 glen 1.13.4 tarball is buggy. builds from it lack version id in user agent header 22:07:36 glen $ wget -q -O - ifconfig.me/ua 22:07:44 glen this prints: Wget/ (linux-gnu) 22:08:03 glen seems the problem is that tarball does not contain this file: sh: build-aux/bzr-version-gen: not found 22:08:12 glen so regenerating autoconf creates empty @VERSION@ 22:08:28 glen $ grep version.string src/version.c 22:08:28 glen const char *version_string = ; in pld linux i workarounded: http://cvs.pld-linux.org/cgi-bin/viewvc.cgi/cvs/packages/wget/wget.spec?r1=1.158r2=1.159 http://cvs.pld-linux.org/cgi-bin/viewvc.cgi/cvs/packages/wget/wget.spec?r1=1.158r2=1.159 Thanks for the report, this one-line patch should fix the problem: Cheers, Giuseppe === modified file 'ChangeLog' --- ChangeLog 2011-12-11 14:18:11 + +++ ChangeLog 2011-12-12 20:24:25 + @@ -1,3 +1,8 @@ +2011-12-12 Giuseppe Scrivano gscriv...@gnu.org + + * Makefile.am (EXTRA_DIST): Add build-aux/bzr-version-gen. + Reported by: Elan Ruusamäe g...@pld-linux.org. + 2011-12-11 Giuseppe Scrivano gscriv...@gnu.org * util/trunc.c (main): Call `close' on the fd and check for errors. === modified file 'Makefile.am' --- Makefile.am 2011-01-01 12:19:37 + +++ Makefile.am 2011-12-12 20:14:16 + @@ -46,7 +46,7 @@ EXTRA_DIST = ChangeLog.README MAILING-LIST \ msdos/ChangeLog msdos/config.h msdos/Makefile.DJ \ msdos/Makefile.WC ABOUT-NLS \ - build-aux/build_info.pl .version + build-aux/build_info.pl build-aux/bzr-version-gen .version CLEANFILES = *~ *.bak $(DISTNAME).tar.gz
Re: [Bug-wget] Disable progress display when log output to file?
Paul Wratt paul.wr...@gmail.com writes: this works but no size in output: wget -nv --output-file=wget.txt _url_ I found a reference to a 2007 post asking for: 3) add support for turning off the progress bar with --progress=none I think I am going to add this support by myself. I have written a small patch to make wget parallel, but until I haven't a clear idea how the progress bar should look (and how to implement it), a ---progress=none will be fine. Giuseppe
Re: [Bug-wget] error message
david painter ddpain...@bigpond.com writes: Help. after installing and trying get my DVD and Cd drives to work I now have a error message stating E:Type '2011-12-04' is not known on line 1 in Source list/etc/apt/source.list.d/medibuntu.list you have reached the GNU wget mailing list. Your problem doesn't seem related to wget, at least from the information you have provided. I think you will have more possibilities to find some help writing to an Ubuntu related mailing list. Ensure to provide more information (what you were trying to do, what system you are using...), messages like yours are often ignored. Giuseppe
Re: [Bug-wget] --page-requisites and robot exclusion issue
Paul Wratt paul.wr...@gmail.com writes: if it does not obey - server admins will ban it the work around: 1) get single html file first - edit out meta tag - re-get with --no-clobber (usually only in landing pages) 2) empty robots.txt (or allow all - search net) possible solutions: A) command line option B) ./configure --disable-robots-check you can specify -e robots=off to wget at runtime. Giuseppe
Re: [Bug-wget] Bug or feature: --continue and --content-disposition don't work together
hello Alex, sorry for the late reply. Correct, when you specify --content-disposition, the destination file name is not known. You can see it by specifying the destination file using -O, as: wget -c --content-disposition --debug http://www.dubovskoy.net/CANTER/01.mp3 -O 01.mp3 that command is pretty unuseful though, just skip --content-disposition. Cheers, Giuseppe Alex gnfa...@rambler.ru writes: Greetings Sorry for bad English If --content-disposition enabled, then --continue will not work. Example: wget -c --content-disposition --debug http://www.dubovskoy.net/CANTER/01.mp3 Every time made request without range field (Range: bytes=57791-), and recieve HTTP/1.1 200 OK wget -c --debug http://www.dubovskoy.net/CANTER/01.mp3 Made request with range field and recieve HTTP/1.1 206 Partial Content wget -c --content-disposition --debug http://www.dubovskoy.net/CANTER/01.mp3 DEBUG output created by Wget 1.13.4 on mingw32. URI encoding = `ASCII' --2011-11-07 09:13:25-- http://www.dubovskoy.net/CANTER/01.mp3 Resolving www.dubovskoy.net (www.dubovskoy.net)... seconds 0,00, 93.180.40.15 Caching www.dubovskoy.net = 93.180.40.15 Connecting to www.dubovskoy.net (www.dubovskoy.net)|93.180.40.15|:80... seconds 0,00, connected. Created socket 4. Releasing 0x00caa008 (new refcount 1). ---request begin--- GET /CANTER/01.mp3 HTTP/1.1 User-Agent: Wget/1.13.4 (mingw32) Accept: */* Host: www.dubovskoy.net Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Content-Length: 2221830 Content-Type: audio/mpeg Last-Modified: Mon, 25 Jan 2010 16:24:03 GMT Accept-Ranges: bytes ETag: b2a3caceda9dca1:329 Server: Microsoft-IIS/6.0 MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET Date: Mon, 07 Nov 2011 07:14:14 GMT ---response end--- 200 OK Registered socket 4 for persistent reuse. Length: 2221830 (2,1M) [audio/mpeg] Saving to: `01.mp3' 0K .. .. .. .. .. 2% 25,2K 84s 50K .. wget -c --debug http://www.dubovskoy.net/CANTER/01.mp3 DEBUG output created by Wget 1.13.4 on mingw32. URI encoding = `ASCII' --2011-11-07 09:18:06-- http://www.dubovskoy.net/CANTER/01.mp3 Resolving www.dubovskoy.net (www.dubovskoy.net)... seconds 0,00, 93.180.40.15 Caching www.dubovskoy.net = 93.180.40.15 Connecting to www.dubovskoy.net (www.dubovskoy.net)|93.180.40.15|:80... seconds 0,00, connected. Created socket 4. Releasing 0x00cfa008 (new refcount 1). ---request begin--- GET /CANTER/01.mp3 HTTP/1.1 Range: bytes=57791- User-Agent: Wget/1.13.4 (mingw32) Accept: */* Host: www.dubovskoy.net Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 206 Partial Content Content-Length: 2164039 Content-Type: audio/mpeg Content-Range: bytes 57791-2221829/2221830 Last-Modified: Mon, 25 Jan 2010 16:24:03 GMT Accept-Ranges: bytes ETag: b2a3caceda9dca1:329 Server: Microsoft-IIS/6.0 MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET Date: Mon, 07 Nov 2011 07:18:56 GMT ---response end--- 206 Partial Content Registered socket 4 for persistent reuse. Length: 2221830 (2,1M), 2164039 (2,1M) remaining [audio/mpeg] Saving to: `01.mp3' [ skipping 50K ] 50K ,, ..
Re: [Bug-wget] Missing gnulib files in development version
Jochen Roderburg roderb...@uni-koeln.de writes: I have some problems compiling recent development versions (with the WARC additions) on my Linux. First it was missing a tmpdir.h. Looking around I saw some tmpdir files in the gnulib directories, but obviously they were not where the build process was looking for them. I tried a rerun of the bootstrap script which updated a lot of gnulib stuff and now the tmpdir.h was found. Next it was missing a base32.h. A base32 is also listed in the bootstrap.conf, but base32 files did not show up despite repeated reruns of the bootstrap script. What to try next ?? something is going wrong with the bootstrap script. Can you please include what the bootstrap script prints? Do you get any error? Does it happen from a clean checkout too? Usually I keep the gnulib development tree in a different directory then specify --gnulib-srcdir=/path/to/gnulib to the bootstrap script, it saves some time and bandwidth. Cheers, Giuseppe
Re: [Bug-wget] Trouble saving the graphs on a page
what happens if you specify -H? Cheers, Giuseppe Randy Kramer rhkra...@gmail.com writes: I just joined the list and I'm jumping the gun a little bit (because I usually lurk on a list for a little while before posting), but... I'm trying to save a local copy of this page with all the graphs: http://www.businessinsider.com/what-wall-street-protesters-are-so-angry-about-2011-10?op=1 After finally finding the wget manual and the examples there, I thought I found the right command--I tried: wget -p --convert-links -nH -nd -Pdownload http://www.businessinsider.com/what-wall-street-protesters-are-so-angry-about-2011-10?op=1 That saves the page, but not the graphs. Can anybody give me a clue as to what I need to do to also save the graphs? Thanks! Randy Kramer
Re: [Bug-wget] wget doesnt work but curl works !
hi Vishwanath, is it possible to use the last released version of wget? You can find it here: ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz I have no clue what changes are contained in the so called Red Hat modified version of wget, I highly suggest to use the last upstream version in order to report bug instead of a modified version. What system are you using? Thanks, Giuseppe Vishwanath Reddy Beemidi bvishwana...@gmail.com writes: Hi, I have trouble getting wget to work when downloading a file using http.curl works fine for the same URL. Both the commands are being run from the same server at the command line, OS : RH Linux mdc1pr012 2.6.18-238.9.1.el5 Following are the commands and the debug messages, any insights into what the problem could be are appreciated. [dsop@mdc1pr012]$ wget -d -S http://www.preprod.abc.com/tools/90067660.csv Setting --server-response (serverresponse) to 1 DEBUG output created by Wget 1.11.4 Red Hat modified on linux-gnu. --2011-10-14 18:00:58-- http://www.preprod.abc.com/tools/90067660.csv Resolving www.preprod.abc.com... 184.31.131.61 Caching www.preprod.abc.com = 184.31.131.61 Connecting to www.preprod.abc.com|184.31.131.61|:80... connected. Created socket 3. Releasing 0x07683bc0 (new refcount 1). ---request begin--- GET /tools/90067660.csv HTTP/1.0 User-Agent: Wget/1.11.4 Red Hat modified Accept: */* Host: www.preprod.abc.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Found Location: http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y= Cache-Control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Connection: close Content-Length: 925 ---response end--- HTTP/1.1 302 Found Location: http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y= Cache-Control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Connection: close Content-Length: 925 Location: http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y=[following] Closed fd 3 --2011-10-14 18:00:58-- http://fd000xnchegrn02/?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y= Resolving fd000xnchegrn02... 11.48.43.72 Caching fd000xnchegrn02 = 11.48.43.72 Connecting to fd000xnchegrn02|11.48.43.72|:80... connected. Created socket 3. Releasing 0x076806e0 (new refcount 1). ---request begin--- GET /?cfru=aHR0cDovL3d3dy5wcmVwcm9kbWFjeXMuZmRzLmNvbS90b29scy85MDA2NzY2MC5jc3Y= HTTP/1.0 User-Agent: Wget/1.11.4 Red Hat modified Accept: */* Host: fd000xnchegrn02 Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 401 Unauthorized Cache-Control: no-cache Pragma: no-cache WWW-Authenticate: NTLM WWW-Authenticate: BASIC realm=Federated_Department_Stores Content-Type: text/html; charset=utf-8 Proxy-Connection: close Set-Cookie: BCSI-CS-3d1fe99b15515258=2; Path=/ Connection: close Content-Length: 1114 ---response end--- HTTP/1.1 401 Unauthorized Cache-Control: no-cache Pragma: no-cache WWW-Authenticate: NTLM WWW-Authenticate: BASIC realm=Federated_Department_Stores Content-Type: text/html; charset=utf-8 Proxy-Connection: close Set-Cookie: BCSI-CS-3d1fe99b15515258=2; Path=/ Connection: close Content-Length: 1114 Closed fd 3 Authorization failed. Following is the curl trace info for the same URL [dsop@mdc1pr012 ~]$ curl --trace-ascii tr.out -o out.dat http://www.preprodmacys.fds.com/tools/90067660.csv % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 155k 100 155k0 0 449k 0 --:--:-- --:--:-- --:--:-- 550k [dsop@mdc1pr012 ~]$ more tr.out == Info: About to connect() to www.preprod.abc.com port 80 == Info: Trying 184.31.131.61... == Info: connected == Info: Connected to www.preprod.abc.com (184.31.131.61) port 80 = Send header, 186 bytes (0xba) : GET /tools/90067660.csv HTTP/1.1 0022: User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 0062: OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 008b: Host: www.preprod.abc.com 00ab: Accept: */* 00b8: = Recv header, 17 bytes (0x11) : HTTP/1.1 200 OK = Recv header, 25 bytes (0x19) : Server: IBM_HTTP_Server = Recv header, 46 bytes (0x2e) : Last-Modified: Fri, 18 Mar 2011 22:42:46 GMT = Recv header, 30 bytes (0x1e) : ETag: b19b9-26f41-7f2b4580 = Recv header, 22 bytes (0x16) : Accept-Ranges: bytes = Recv header, 26 bytes (0x1a) : Content-Type: text/plain = Recv header, 37 bytes (0x25) : Date: Fri, 14 Oct 2011 22:06:15 GMT = Recv header, 24 bytes
Re: [Bug-wget] WARC, new version
Hey Gijs, I have added a ChangeLog entry and pushed the change. Thanks! Giuseppe Gijs van Tulder gvtul...@gmail.com writes: lovely. I am going to push it soon with some small adjustments. That's good to hear. There's one other small adjustment that you may want to make, see the attached patch. One of the WARC functions uses the basename function, which causes problems on OS X. Including libgen.h and strdup-ing the output of basename seems to solve this problem. Thanks, Gijs Op 04-11-11 22:27 schreef Giuseppe Scrivano: Gijs van Tuldergvtul...@gmail.com writes: Hi Giuseppe, * I've changed the configure.ac and src/Makefile.am. * I've added a ChangeLog entry. lovely. I am going to push it soon with some small adjustments. Thanks for the great work. Whenever it happens to be in the same place, I'll buy you a beer :-) Cheers, Giuseppe
Re: [Bug-wget] WARC, new version
Gijs van Tulder gvtul...@gmail.com writes: Hi Giuseppe, * I've changed the configure.ac and src/Makefile.am. * I've added a ChangeLog entry. lovely. I am going to push it soon with some small adjustments. Thanks for the great work. Whenever it happens to be in the same place, I'll buy you a beer :-) Cheers, Giuseppe
Re: [Bug-wget] Memory leak when using GnuTLS
Committed with a ChangeLog entry and a small change. Another beer? :-) Thanks! Giuseppe Gijs van Tulder gvtul...@gmail.com writes: Hi, I think there is a memory leak in the GnuTLS part of wget. When downloading multiple files from a HTTPS server, wget with GnuTLS uses a lot of memory. Perhaps an explanation for this can be found in src/http.c. The gethttp calls ssl_init for each download: /* Initialize the SSL context. After this has once been done, it becomes a no-op. */ if (!ssl_init ()) The OpenSSL version of ssl_init, in src/openssl.c, checks if SSL has already been initialized and doesn't repeat the work. But the GnuTLS version doesn't: bool ssl_init () { const char *ca_directory; DIR *dir; gnutls_global_init (); gnutls_certificate_allocate_credentials (credentials); GnuTLS is initialized again and again, but there is never a call to gnutls_global_deinit. I've attached a small patch to add a check to ssl_init in src/gnutls.c, similar to the check already in src/openssl.c. With it, wget can still download over HTTPS and the memory usage stays within reasonable limits. Thanks, Gijs
Re: [Bug-wget] WARC, new version
Gijs van Tulder gvtul...@gmail.com writes: === modified file 'bootstrap.conf' --- bootstrap.conf2011-08-11 12:23:39 + +++ bootstrap.conf2011-10-21 19:24:18 + @@ -28,6 +28,7 @@ accept alloca announce-gen +base32 bind c-ctype clock-time @@ -49,6 +50,7 @@ mbtowc mkdir crypto/md5 +crypto/sha1 pipe quote quotearg @@ -63,6 +65,7 @@ stdbool strcasestr strerror_r-posix +tmpdir unlocked-io update-copyright vasprintf === modified file 'configure.ac' --- configure.ac 2011-09-04 12:19:12 + +++ configure.ac 2011-10-23 21:21:49 + @@ -511,7 +511,22 @@ fi fi - +# Warc +AC_CHECK_HEADER(uuid/uuid.h, UUID_FOUND=yes, UUID_FOUND=no) +if test x$UUID_FOUND = xno; then + AC_MSG_ERROR([libuuid is required]) +fi + +AC_CHECK_LIB(uuid, uuid_generate, UUID_FOUND=yes, UUID_FOUND=no) +if test x$UUID_FOUND = xno; then + AC_MSG_ERROR([libuuid is required]) +fi +LIBUUID=-luuid +AC_SUBST(LIBUUID) +LDFLAGS=${LDFLAGS} -L$libuuid/lib +CPPFLAGS=${CPPFLAGS} -I$libuuid/include I think we shouldn't change the value of LDFLAGS and CPPFLAGS as they are user variables. Also, where is $libuuid defined? We can just drop these lines. if (hs-res = 0) CLOSE_FINISH (sock); else -{ - if (hs-res 0) -hs-rderrmsg = xstrdup (fd_errstr (sock)); - CLOSE_INVALIDATE (sock); -} +CLOSE_INVALIDATE (sock); Why? The rest seems ok, if you also provide a ChangeLog I can proceed to merge it. Thanks, Giuseppe
Re: [Bug-wget] parallel wget...
Hrvoje Niksic hnik...@xemacs.org writes: I expect the biggest changes to be required in progress.c. :) anyone has some ideas? :-) How should it look? Cheers, Giuseppe
Re: [Bug-wget] WARC, new version
Gijs van Tulder gvtul...@gmail.com writes: Hi all, Based on the comments by Giuseppe and Ángel I've revised the implementation of the wget WARC extenstion. I've attached a patch. 1. It's no longer based on the warctools library. Instead, I've written a couple of new WARC-writing functions, using zlib for the gzip compression. The new implementation is much smaller. 2. I extracted a small part of the gethttp method in http.c and moved it to a new function, read_response_body, which is responsible for downloading the response body and writing it to a file. The WARC extension needs to save the response in multiple cases: when the response is successful, but also when the response is a redirect, 401 unauthorized or an error. Moving the response-saving to a separate method makes it possible to reuse this part for all four situations. Any thoughts? WOW great work! It is much better now. I wonder if it is possible to remove the dependency from libuuid, maybe provide replacement for uuid_generate and uuid_unparse when libuuid is not found? Even a simple implementation based on rand? Beside it, there are only very small adjustments which need to be done to the code in order to include it into wget, like lines not longer than 80 characters or using foo *bar instead of foo * bar; in any case these are not important and I can go trough them before commit your changes. Thanks, Giuseppe
Re: [Bug-wget] [PATCH] paramcheck: Use + quantifier and return copy
Thanks. Pushed. Cheers, Giuseppe Steven Schubiger s...@member.fsf.org writes: === modified file 'ChangeLog' --- ChangeLog 2011-09-04 12:19:12 + +++ ChangeLog 2011-10-16 18:18:34 + @@ -1,3 +1,8 @@ +2011-10-16 Steven Schubiger s...@member.fsf.org + + * util/paramcheck.pl: Match 1 or more times where applicable. + (extract_entries): Return a copy instead of reference. + 2011-09-04 Alan Hourihane al...@fairlite.co.uk (tiny change) * configure.ac: Check for libz when gnutls is used. === modified file 'util/paramcheck.pl' --- util/paramcheck.pl2011-01-01 12:19:37 + +++ util/paramcheck.pl2011-10-16 02:36:40 + @@ -33,11 +33,11 @@ my @args = ([ $main_content, -qr/static \s+? struct \s+? cmdline_option \s+? option_data\[\] \s+? = \s+? \{ (.*?) \}\;/sx, +qr/static \s+? struct \s+? cmdline_option \s+? option_data\[\] \s+? = \s+? \{ (.+?) \}\;/sx, [ qw(long_name short_name type data argtype) ], ], [ $init_content, -qr/commands\[\] \s+? = \s+? \{ (.*?) \}\;/sx, +qr/commands\[\] \s+? = \s+? \{ (.+?) \}\;/sx, [ qw(name place action) ], ]); @@ -78,18 +78,18 @@ my (@entries, %index, $i); foreach my $chunk (@$chunks) { -my ($args) = $chunk =~ /\{ \s+? (.*?) \s+? \}/sx; +my ($args) = $chunk =~ /\{ \s+? (.+?) \s+? \}/sx; next unless defined $args; my @args = map { tr/'//d; $_ } map { - /\((.*?)\)/ ? $1 : $_ + /\((.+?)\)/ ? $1 : $_ } split /\,\s+/, $args; my $entry = { map { $_ = shift @args } @$names }; -($entry-{line}) = $chunk =~ /^ \s+? (\{.*)/mx; +($entry-{line}) = $chunk =~ /^ \s+? (\{.+)/mx; if ($chunk =~ /deprecated/i) { $entries[-1]-{deprecated} = true; } @@ -103,9 +103,9 @@ push @entries, $entry; } -push @entries, \%index; +push @entries, { %index }; -return \@entries; +return [ @entries ]; } sub output_results @@ -281,7 +281,7 @@ while ($tex =~ /^\@item\w*? \s+? --([-a-z0-9]+)/gmx) { $tex_items{$1} = true; } -my ($help) = $main =~ /\n print_help .*? \{\n (.*) \n\} \n/sx; +my ($help) = $main =~ /\n print_help .*? \{\n (.+) \n\} \n/sx; while ($help =~ /--([-a-z0-9]+)/g) { $main_items{$1} = true; }
[Bug-wget] parallel wget...
hello, The winter is coming, not much to do outside and I have spent the day working on something I had in mind already for too long. Unfortunately I couldn't start the implementation as I have thought it could be possible, there are too many nested `select' points in the code and implement an event-driven single-thread parallel wget seems like too much work. I have used different threads, spawned by retrieve_tree in the recur.c, I haven't published the code yet[1] since it is just a ugly hack for now and still sometimes it segfaults, it will take a while before I can go trough the code and ensure it is reentrant and can be used by different threads without problems. But I would like to share some results with you $ LANG=C wget --version | head -n 1 GNU Wget 1.13 built on linux-gnu. $ LANG=C ./wget --version | head -n 1 GNU Wget 1.13.4-2567-dirty built on linux-gnu. $ rm -rf it.gnu.org/ time wget -q --no-http-keep-alive -r -np http://it.gnu.org/~gscrivano/files/parallel/ real0m2.808s user0m0.008s sys 0m0.020s $ rm -rf it.gnu.org/ time ./wget --jobs=2 -q --no-http-keep-alive -r -np http://it.gnu.org/~gscrivano/files/parallel/ real0m1.291s user0m0.004s sys 0m0.016s $ rm -rf it.gnu.org/ time ./wget --jobs=4 -q --no-http-keep-alive -r -np http://it.gnu.org/~gscrivano/files/parallel/ real0m0.521s user0m0.008s sys 0m0.012s $ rm -rf it.gnu.org/ time ./wget --jobs=8 -q --no-http-keep-alive -r -np http://it.gnu.org/~gscrivano/files/parallel/ real0m0.395s user0m0.008s sys 0m0.004s Nice eh? :-) Any comment? Suggestion? Insult? Cheers, Giuseppe 1) but the braves can find the current ugly hack here: http://it.gnu.org/~gscrivano/files/parallel_wget.patch
Re: [Bug-wget] WARC output
Hi Gijs, Gijs van Tulder gvtul...@gmail.com writes: can you please send a complete diff against the current development tree version? Here's the diff of the WARC additions (1.9MB zipped) to revision 2565: http://dl.dropbox.com/u/365100/wget_warc-20110926-complete.patch.bz2 the patch is huge and I think we don't want to add some many files into the wget tree. Can't we assume the user will install the warc tools by herself and let configure check if they are installed or not? This will require some more work but the result will be much less intrusive. What do you think? Thanks, Giuseppe
Re: [Bug-wget] Wget 1.13.4 v. VMS -- Various problems
Steven M. Schweda s...@antinode.info writes: [Various other changes/fixes affecting VMS] Still wondering. For the curious, a set of patches should be available at: http://antinode.info/ftp/wget/wget-1_13_4/1_13_4_1.dru can you please include a ChangeLog entry for each of them? Thanks, Giuseppe
Re: [Bug-wget] Patch: new option --content-on-error: do not skip content on http server error
Henrik Holst henrik.ho...@millistream.com writes: No problem, I'll give it a try, yell at me if I do something wrong: Good job! I have applied the patch and pushed it. Cheers, Giuseppe
Re: [Bug-wget] Patch: new option --content-on-error: do not skip content on http server error
Hi Henrik, Henrik Holst henrik.ho...@millistream.com writes: This patch adds an option to not skip the content sent by the HTTP server when the server responds with a status code in the 4xx and 5xx range. thanks for the patch, I am quite inclined to include it. Can you please provide the ChangeLog file entry? Thanks! Giuseppe
Re: [Bug-wget] Recursive wget: change in handling of file permissions?
Hi Micah, Micah Cowan mi...@cowan.name writes: So, from where I'm sitting, it looks like --preserve-permissions was an implemented feature for two major releases (1.10 and 1.11 series), and has now been missing from the last two major releases (1.12 and 1.13). Probably, it should be reinstated, and documentation added, to restore previous behavior. Giuseppe? thanks for the detailed analysis and sorry for my late reply. If this is the case, then I think --preserve-permissions has to be restored as it used to work in the 1.10.* series. Cheers, Giuseppe
Re: [Bug-wget] texinfo @dir information
k...@freefriends.org (Karl Berry) writes: Tiny change for the manual to make its dir entry consistent with others, ok? Ok. Pushed. Thanks, Giuseppe
Re: [Bug-wget] WARC output
Gijs van Tulder gvtul...@gmail.com writes: Hi. It's been a while since we've discussed the WARC addition to Wget. Is there anything I can help with? can you please send a complete diff against the current development tree version? I'll take a look at it ASAP. Thanks, Giuseppe
Re: [Bug-wget] Introduction
Manuel José Muñoz Calero manuelj.mu...@gmail.com writes: These days I've been reading as much as I could: manual, wiki, code and baazar usage. If you are agree, I'm beginning with... #21439: Support for FTP proxy authentication It sounds great! ... planned release 1.15, status confirmed, assigned to none. One question. Should I work with the main branch? Yes, please. Cheers, Giuseppe
Re: [Bug-wget] Introduction
Daniel Stenberg dan...@haxx.se writes: On Mon, 26 Sep 2011, Giuseppe Scrivano wrote: #21439: Support for FTP proxy authentication It sounds great! Since there's no FTP proxy standard or spec, how exactly is this going to work? ops, thanks to have pointed it out. I wasn't aware of it and I took it for granted. The bug report redirects to this discussion: http://article.gmane.org/gmane.comp.web.wget.general/7300 Giuseppe
Re: [Bug-wget] --version copyright year stale
k...@freefriends.org (Karl Berry) writes: Hi Giuseppe, The copyright year in the wget --version output should be 2011, not 2009. As seen in 1.13.4. thanks to have reported it, this patch fixes it: === modified file 'src/main.c' --- src/main.c 2011-09-06 13:53:39 + +++ src/main.c 2011-09-19 15:26:41 + @@ -884,7 +884,7 @@ /* TRANSLATORS: When available, an actual copyright character (cirle-c) should be used in preference to (C). */ if (fputs (_(\ -Copyright (C) 2009 Free Software Foundation, Inc.\n), stdout) 0) +Copyright (C) 2011 Free Software Foundation, Inc.\n), stdout) 0) exit (3); if (fputs (_(\ License GPLv3+: GNU GPL version 3 or later\n\ Cheers, Giuseppe
[Bug-wget] GNU wget 1.13.4 released
Hello, I am pleased to announce the new version of GNU wget. It fixes some bugs reported in the recent wget 1.13.3 release. It is available for download here: ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz and the GPG detached signatures using the key C03363F4: ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.gz.sig ftp://ftp.gnu.org/gnu/wget/wget-1.13.4.tar.xz.sig To reduce load on the main server, you can use this redirector service which automatically redirects you to a mirror: http://ftpmirror.gnu.org/wget/wget-1.13.4.tar.gz http://ftpmirror.gnu.org/wget/wget-1.13.4.tar.xz * Noteworthy changes in Wget 1.13.4 ** Now --version and --help work again. ** Fix a build error on solaris 10 sparc. ** Now --timestamping and --continue work well together. ** Return a network failure when FTP downloads fail and --timestamping is specified. Please report any problem you may experience to the bug-wget@gnu.org mailing list. Have fun! Giuseppe
Re: [Bug-wget] Wget is not downloading background images.
ma...@inbox.com writes: In these specific tests, I am using GNU Wget 1.11.4 on a Windows platform. CSS support was added in wget 1.12. Cheers, Giuseppe
Re: [Bug-wget] Suggestion: An option for Wget to reset all command-line defaults.
ma...@inbox.com writes: I wonder if Wget needs an option like --resetdefaults=yes to reset any changes that may have been made in the .wgetrc file. I think you can get the same behaviour by using --config=/dev/null. The parameter --config is supported since wget 1.13. Cheers, Giuseppe
Re: [Bug-wget] Wget should not ignore quota specifications for single files.
matt...@creativegraphicsolutions.biz writes: I've tried several work-arounds for this, all with no success. Wget simply refuses to follow quota specifications for single files no matter how Wget is invoked. Respecting quotas for single files would be useful in other situations where Wget is called automatically from within a script. hard-quotas are not supported (yet), so it doesn't really matter how you invoke wget, it will never obey :-) At the moment, as a very ugly workaround, you can use ulimit -f. Cheers, Giuseppe
Re: [Bug-wget] wget 1.13: FIONBIO does not exist on solaris
Christian Jullien eli...@orange.fr writes: When compiling gnutls.c on solaris 10 sparc with gcc 4.6.1 I get an error on: ret = ioctl (fd, FIONBIO, one); because FIONBIO is undefined. Adding: #include sys/fcntl.h Let: #ifdef F_GETFL ret = fcntl (fd, F_SETFL, flags | O_NONBLOCK); to be used instead. It then compiles and correctly works. Thank you to see how to include sys/fcntl.h conditionnally. I checked but it is not clear to me when and why you decide to include this system file. I'll be glad to test new versions for you. Thanks to have reported it. We can assume sys/fcntl.h is always present as gnulib will provide a replacement on systems where this file is missing. The change I am going to commit is simply: === modified file 'src/gnutls.c' --- src/gnutls.c2011-08-30 14:43:25 + +++ src/gnutls.c2011-09-04 10:43:35 + @@ -48,6 +48,8 @@ #include ptimer.h #include ssl.h +#include sys/fcntl.h + #ifdef WIN32 # include w32sock.h #endif
Re: [Bug-wget] A bug with wget 1.13.3
Hi Vladimir, thanks, it has been fixed in the source repository. Cheers, Giuseppe Vladimir Lomov lomov...@gmail.com writes: Hello, I'm on Archlinux x86_64. After updating the system with the help of package manager wget aborts on simple `wget --version' with exit code 3. Seems I found the reason of that behavior, I attached with the message a patch vs. bzr trunk (revno 2555). I checked it on top of wget 1.13.3 (patching release source). --- WBR, Vladimir Lomov
[Bug-wget] GNU wget 1.13.3 released
I am pleased to announce the new version of GNU wget. It is available for download here: ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.gz ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.xz and the GPG detached signatures using the key C03363F4: ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.gz.sig ftp://ftp.gnu.org/gnu/wget/wget-1.13.3.tar.xz.sig To reduce load on the main server, you can use this redirector service which automatically redirects you to a mirror: http://ftpmirror.gnu.org/wget/wget-1.13.3.tar.gz http://ftpmirror.gnu.org/wget/wget-1.13.3.tar.xz * Noteworthy changes in Wget 1.13.3 ** Support HTTP/1.1 ** Now by default the GNU TLS library for secure connections, instead of OpenSSL. ** Fix some portability issues. ** Handle properly malformed status line in a HTTP response. ** Ignore zero length domains in $no_proxy. ** Set new cookies after an authorization failure. ** Exit with failure if -k is specified and -O is not a regular file. ** Cope better with unclosed html tags. ** Print diagnostic messages to stderr, not stdout. ** Do not use an additional HEAD request when --content-disposition is used, but use directly GET. ** Report the average transfer speed correctly when multiple URL's are specified and -c influences the transferred data amount. ** GNU TLS backend works again. ** Now --timestamping and --continue works well together. ** By default, on server redirects, use the original URL to get the local file name. Close CVE-2010-2252. This introduces a backward-incompatibility; any script that relies on the old behaviour must use --trust-server-names. ** Fix a problem when -k is used and some URLs are specified trough CSS. ** Convert correctly URLs that need to be encoded to local files when following links. ** Use persistent connections with proxies supporting them. ** Print the total download time as part of the summary for recursive downloads. ** Now it is possible to specify a different startup configuration file trough the --config option. ** Fix an infinite loop with the error 'filename has sprung into existence' on a network error and -nc is used. ** Now --adjust-extension does not modify the file extension if the file ends in .htm. ** Support HTTP/1.1 307 redirects keep request method. ** Now --no-parent doesn't fetch undesired files if HTTP and HTTPS are used by the same host on different pages. ** Do not attempt to remove the file if it is not in the accept rules but it is the output destination file. ** Introduce `show_all_dns_entries' to print all IP addresses corresponding to a DNS name when it is resolved. Please report any problem you may experience to the bug-wget@gnu.org mailing list. Have fun! Giuseppe
Re: [Bug-wget] Wget 1.12 (macports) has NULLs and stuff appended after --convert-links
Hello Denis, this bug will be fixed in the next release of wget. It wasn't officially released yet but you can find newer tarballs here: ftp://ftp.gnu.org/gnu/wget If it still doesn't work for you with 1.13, please report it. Cheers, Giuseppe Denis Laplante denis.lapla...@ubc.ca writes: Summary: Wget 1.12 (macports) has NULLs and stuff appended after html mirror file. command: wget -r --convert-links --adjust-extension ... ### RESULT # - Result: content has many links translated, but junk appended - Sample: systems.1.html - mostly good content with left-sidebar links untranslated, but main links translated - followed by 1307 * NUL - followed by 60 lines = 4291 characters from same file (links translated) starting at point=31517 of 41150 in middle of left-sidebar (links translated). - All files affected ! I have looked at http://savannah.gnu.org/search/?words=convert-linkstype_of_search=bugsSearch=Searchexact=1#options ## COMMAND ## WG_BASIC=-r --convert-links --adjust-extension --page-requisites --no- verbose WG_HOBBLE=--level=2 --limit-rate=100k --quota=10m --wait-seconds=1 WG_EXCLUDE=--no-parent -- reject=*:*,index.php*,Special:*,User:*,Talk:* --exclude- directories=/.../Special:* PD_SESS_COOKIE=qwertyuiop WG_STARTURL=https://wiki...; /opt/local/bin/wget ${WG_BASIC} ${WG_RESTRICT} ${WG_EXCLUDE} \ --header Cookie: wikidb_UserName=...; wikidb__session=$ {PD_SESS_COOKIE} \ ${WG_STARTURL} VERSION # $ wget -V GNU Wget 1.12 built on darwin9.8.0. +digest +ipv6 +nls +ntlm +opie +md5/openssl +https -gnutls +openssl +iri Wgetrc: /Users/laplante/.wgetrc (user) /opt/local/etc/wgetrc (system) Locale: /opt/local/share/locale Compile: /usr/bin/gcc-4.0 -DHAVE_CONFIG_H -DSYSTEM_WGETRC=/opt/local/etc/wgetrc -DLOCALEDIR=/opt/local/share/locale -I. -I../lib -I/opt/local/include -O2 -arch i386 Link: /usr/bin/gcc-4.0 -O2 -arch i386 -L/opt/local/lib -liconv -lintl - arch i386 -lssl -lcrypto -lintl -liconv -lc -Wl,-framework -Wl,CoreFoundation -ldl -lidn ftp-opie.o openssl.o http-ntlm.o gen-md5.o ../lib/libgnu.a Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://www.gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Originally written by Hrvoje Niksic hnik...@xemacs.org. Currently maintained by Micah Cowan mi...@cowan.name. Please send bug reports and questions to bug-wget@gnu.org.
Re: [Bug-wget] Download files and preserve their data and time
Ray Satiro raysat...@yahoo.com writes: Calling utime() works. You could also use SetFileTime(). 2489 changed utime to utimes but the CRT doesn't have utimes. thanks to have checked it. I am going to apply the patch below. Cheers, Giuseppe === modified file 'configure.ac' --- configure.ac2011-08-11 10:20:25 + +++ configure.ac2011-08-25 09:01:31 + @@ -197,7 +197,7 @@ AC_FUNC_FSEEKO AC_CHECK_FUNCS(strptime timegm vsnprintf vasprintf drand48) AC_CHECK_FUNCS(strtoll usleep ftello sigblock sigsetjmp memrchr wcwidth mbtowc) -AC_CHECK_FUNCS(sleep symlink) +AC_CHECK_FUNCS(sleep symlink utime) if test x$ENABLE_OPIE = xyes; then AC_LIBOBJ([ftp-opie]) === modified file 'src/utils.c' --- src/utils.c 2011-08-11 12:23:39 + +++ src/utils.c 2011-08-25 09:22:03 + @@ -42,15 +42,23 @@ #ifdef HAVE_PROCESS_H # include process.h /* getpid() */ #endif -#ifdef HAVE_UTIME_H -# include utime.h -#endif #include errno.h #include fcntl.h #include assert.h #include stdarg.h #include locale.h +#if HAVE_UTIME +# include sys/types.h +# ifdef HAVE_UTIME_H +# include utime.h +# endif + +# ifdef HAVE_SYS_UTIME_H +# include sys/utime.h +# endif +#endif + #include sys/stat.h /* For TIOCGWINSZ and friends: */ @@ -487,6 +495,20 @@ void touch (const char *file, time_t tm) { +#if HAVE_UTIME +# ifdef HAVE_STRUCT_UTIMBUF + struct utimbuf times; +# else + struct { +time_t actime; +time_t modtime; + } times; +# endif + times.modtime = tm; + times.actime = time (NULL); + if (utime (file, times) == -1) +logprintf (LOG_NOTQUIET, utime(%s): %s\n, file, strerror (errno)); +#else struct timespec timespecs[2]; int fd; @@ -506,6 +528,7 @@ logprintf (LOG_NOTQUIET, futimens(%s): %s\n, file, strerror (errno)); close (fd); +#endif } /* Checks if FILE is a symbolic link, and removes it if it is. Does
Re: [Bug-wget] Download files and preserve their data and time
David H. Lipman dlip...@verizon.net writes: I don't know when it happened, probably when I upgraded WGET, but when I download files thedy inherit the date and time of the file of when they were downloaded. It used to be that when the file was downloaded, it retained the date and time of the file it had on the server. Not when it was downloaded. How can I force WGET to return to that condition ? it has to work in the same way as it used to do. It seems to work well here, using the last revision from the source repository: $ LANG=C ./wget -q -d http://www.gnu.org/graphics/gnu-head-mini.png 21 | grep ^Last-Modified Last-Modified: Sun, 05 Dec 2010 20:58:51 GMT $ LANG=C stat gnu-head-mini.png | grep ^Modify Modify: 2010-12-05 21:58:51.0 +0100 Can you please provide more information? What version of wget (wget --version)? What operating system? Do you get a different output using that two commands? This is also useful for debugging, do you see something different? $ LANG=C strace -e utimensat ./wget -q http://www.gnu.org/graphics/gnu-head-mini.png utimensat(4, NULL, {{1313833704, 0}, {1291582731, 0}}, 0) = 0 Thanks, Giuseppe
Re: [Bug-wget] Download files and preserve their data and time
David H. Lipman dlip...@verizon.net writes: WinXP/Vista -- Win32 Y:\wget --version GNU Wget 1.12-2504 built on mingw32. the change introduced by the revision gscriv...@gnu.org-20110419103346-cctazi0zxt2770wt could be the reason of the problem you have reported. If it is possible for you to compile wget, could you try to revert this patch? Does it solve the problem for you? If you have problems to re-build wget then I'll try to setup the environment here. Thanks, Giuseppe === modified file 'bootstrap.conf' --- bootstrap.conf 2011-04-19 09:31:25 + +++ bootstrap.conf 2011-04-19 10:33:46 + @@ -30,9 +30,11 @@ announce-gen bind c-ctype +clock-time close connect fcntl +futimens getaddrinfo getopt-gnu getpass-gnu === modified file 'src/Makefile.am' --- src/Makefile.am 2011-04-03 22:13:53 + +++ src/Makefile.am 2011-04-19 10:33:46 + @@ -37,7 +37,7 @@ # The following line is losing on some versions of make! DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ -DLOCALEDIR=\$(localedir)\ -LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ +LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME) bin_PROGRAMS = wget wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c\ === modified file 'src/utils.c' --- src/utils.c 2011-04-18 12:37:42 + +++ src/utils.c 2011-04-19 10:33:46 + @@ -51,8 +51,7 @@ #include stdarg.h #include locale.h -#include sys/time.h - +#include sys/stat.h /* For TIOCGWINSZ and friends: */ #ifdef HAVE_SYS_IOCTL_H @@ -488,15 +487,25 @@ void touch (const char *file, time_t tm) { - struct timeval timevals[2]; - - timevals[0].tv_sec = time (NULL); - timevals[0].tv_usec = 0L; - timevals[1].tv_sec = tm; - timevals[1].tv_usec = 0L; - - if (utimes (file, timevals) == -1) -logprintf (LOG_NOTQUIET, utimes(%s): %s\n, file, strerror (errno)); + struct timespec timespecs[2]; + int fd; + + fd = open (file, O_WRONLY); + if (fd 0) +{ + logprintf (LOG_NOTQUIET, open(%s): %s\n, file, strerror (errno)); + return; +} + + timespecs[0].tv_sec = time (NULL); + timespecs[0].tv_nsec = 0L; + timespecs[1].tv_sec = tm; + timespecs[1].tv_nsec = 0L; + + if (futimens (fd, timespecs) == -1) +logprintf (LOG_NOTQUIET, futimens(%s): %s\n, file, strerror (errno)); + + close (fd); } /* Checks if FILE is a symbolic link, and removes it if it is. Does === modified file 'tests/Makefile.am' --- tests/Makefile.am 2011-04-03 22:13:53 + +++ tests/Makefile.am 2011-04-19 10:33:46 + @@ -34,7 +34,7 @@ PERL = perl PERLRUN = $(PERL) -I$(srcdir) -LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ +LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME) .PHONY: test run-unit-tests run-px-tests
Re: [Bug-wget] Support of non-linux OS's going down the drain?
can you please H.Merijn Brand h.m.br...@xs4all.nl writes: That is bad. Why? GNU TLS /might/ be more safe than OpenSSL in some aspects, but is is for sure not available on (older) versions of AIX and/or HP-UX. It is already quite a bit of work to get OpenSSL and OpenSSH to be rather actual/recent on those boxes, but you can simply forget getting gnutls to be available on those. The dependency chain is a straight hell. but in that case, a --with-ssl=openssl will fix this problem, as you did. With HP-UX 11.00 and HP C-ANSI-C it doesn't even *compile* anymore! $ ./configure --prefix=/pro --disable-nls --with-ssl=openssl --without-libiconv-prefix --without-libintl-prefix --without-libgnutls-prefix : $ make : cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include -I/usr/contrib/X11R6/include -c -o c-ctype.o c-ctype.c source='cloexec.c' object='cloexec.o' libtool=no \ DEPDIR=.deps depmode=hp /bin/sh ../build-aux/depcomp \ cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include -I/usr/contrib/X11R6/include -c -o cloexec.o cloexec.c cpp: ./, line 4: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 7: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 13: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 21: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. can you please send me your cloexec.c file (or any other file causing it)? My cloexec.c doesn't have such fancy characters, and unfortunately I don't access to any HP-UX machine where I can test it by myself. Have you compiled wget 1.12 on your machine? Thanks, Giuseppe
[Bug-wget] wget fails to build under HP-UX 11.00
Hello, The following bug report was sent to the wget mailing list, I am not sure why it happens, it seems related to gnulib, has anyone an idea about it? I don't have access to any HP-UX box to test it by myself. Thanks, Giuseppe With HP-UX 11.00 and HP C-ANSI-C it doesn't even *compile* anymore! $ ./configure --prefix=/pro --disable-nls --with-ssl=openssl --without-libiconv-prefix --without-libintl-prefix --without-libgnutls-prefix : $ make : cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include -I/usr/contrib/X11R6/include -c -o c-ctype.o c-ctype.c source='cloexec.c' object='cloexec.o' libtool=no \ DEPDIR=.deps depmode=hp /bin/sh ../build-aux/depcomp \ cc -DHAVE_CONFIG_H -I. -I../src -I/pro/local/include -I/usr/local/include -Ae -O2 +Onolimit +Z -z -I/pro/local/include -I/usr/local/include -I/usr/include/X11R6 -I/usr/local/X11R6/include -I/usr/contrib/X11R6/include -c -o cloexec.o cloexec.c cpp: ./, line 4: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 7: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 13: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cpp: ./, line 21: warning 2014: Illegal ^A, ^B, ^C, or ^D in source. Changed to space. cc: , line 2: error 1000: Unexpected symbol: �. cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: �. cc: , line 4: error 1000: Unexpected symbol: �. cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: |. cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: . cc: , line 4: error 1000: Unexpected symbol: `. cc: , line 6: error 1000: Unexpected symbol: . cc: , line 7: error 1000: Unexpected symbol: p. cc: , line 13: error 1000: Unexpected symbol: �. cc: , line 16: error 1000: Unexpected symbol: . cc: , line 18: error 1000: Unexpected symbol: . cc: , line 20: error 1000: Unexpected symbol: . cc: , line 21: error 1000: Unexpected symbol: $float. cc: panic 2017: Cannot recover from earlier errors, terminating. make[4]: *** [cloexec.o] Error 1 make[4]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib' make[2]: *** [all] Error 2 make[2]: Leaving directory `/pro/3gl/GNU/wget-1.13.1/lib' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/pro/3gl/GNU/wget-1.13.1' make: *** [all] Error 2 Exit 2
Re: [Bug-wget] getopt/'struct options' build error in 1.13.1
ops... Thanks to have reported it. I am sure it depends from a fix for a similar error Perry had on AIX. At this point, it seems the only way to fix the problem is to include config.h at the very beginning of css.c. I have looked at the flex documentation but I can't find anything useful to prevent other files to be included before the C code snippet. Has anybody an idea? Should I go for an hack? Cheers, Giuseppe Jack Nagel jackna...@gmail.com writes: I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8. It fails during 'make' with gcc 4.2 here: /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -I. -I../lib -I../lib -c css.c In file included from ../lib/unistd.h:113:0, from css.c:4738: ../lib/getopt.h:196:8: error: redefinition of 'struct option' /usr/include/getopt.h:54:8: note: originally defined here ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long' /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' was here ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only' /usr/include/getopt.h:71:5: note: previous declaration of 'getopt_long_only' was here However, I can successfully build wget 1.13 on the same system under the same conditions. (Please CC as I am not subscribed to the list). Thanks in advance for the help. Jack
Re: [Bug-wget] wget 1.13.1 hangs on redirected url
Hello, I have tried the command you suggested but I wasn't able to let it hang. Are you able to reproduce this problem every time? If so, can you please include the debug information generated by --debug? Thanks, Giuseppe Axel Reinhold a...@freakout.de writes: Hi, wget 1.13.1 hangs on redirected site foreever - this url also has digest authorization! works fine with wget 1.12 . [wpack@pie ~]$ /tmp/wget-1.13.1-1/bin/wget -O- http://calea.wpack.de/sites/active --2011-08-17 08:34:39-- http://calea.wpack.de/sites/active Resolving calea.wpack.de (calea.wpack.de)... 188.138.34.37 Connecting to calea.wpack.de (calea.wpack.de)|188.138.34.37|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Reusing existing connection to calea.wpack.de:80. HTTP request sent, awaiting response... 200 Length: 66 [text/html] Saving to: `STDOUT' 0% [ ] 0 K/s Regards Axel
Re: [Bug-wget] getopt/'struct options' build error in 1.13.1
Yes, but it seems to create another problem under Mac OS X 10.6.8. In any case, this is the hack I was talking about, does it work for both of you? Thanks, Giuseppe === modified file 'src/Makefile.am' --- src/Makefile.am 2011-08-11 08:26:43 + +++ src/Makefile.am 2011-08-17 14:15:58 + @@ -39,9 +39,12 @@ DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ -DLOCALEDIR=\$(localedir)\ LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME) +noinst_LIBRARIES = libcss.a +libcss_a_SOURCES = css.l + bin_PROGRAMS = wget wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c\ - css.l css-url.c \ + css_.c css-url.c \ ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \ http.c init.c log.c main.c netrc.c progress.c ptimer.c \ recur.c res.c retr.c spider.c url.c\ @@ -57,6 +60,7 @@ LDADD = $(LIBOBJS) ../lib/libgnu.a AM_CPPFLAGS = -I$(top_builddir)/lib -I$(top_srcdir)/lib + ../lib/libgnu.a: cd ../lib $(MAKE) $(AM_MAKEFLAGS) @@ -78,6 +82,10 @@ $(AM_LDFLAGS) $(LDFLAGS) $(LIBS) $(wget_LDADD)';' \ | $(ESCAPEQUOTE) $@ +css_.c: css.c + echo '#include wget.h' $@ + cat css.c $@ + check_LIBRARIES = libunittest.a libunittest_a_SOURCES = $(wget_SOURCES) test.c build_info.c test.h nodist_libunittest_a_SOURCES = version.c Perry Smith pedz...@gmail.com writes: I thought you were just going to remove the include of wget.h ? On Aug 17, 2011, at 9:09 AM, Giuseppe Scrivano wrote: ops... Thanks to have reported it. I am sure it depends from a fix for a similar error Perry had on AIX. At this point, it seems the only way to fix the problem is to include config.h at the very beginning of css.c. I have looked at the flex documentation but I can't find anything useful to prevent other files to be included before the C code snippet. Has anybody an idea? Should I go for an hack? Cheers, Giuseppe Jack Nagel jackna...@gmail.com writes: I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8. It fails during 'make' with gcc 4.2 here: /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -I. -I../lib -I../lib -c css.c In file included from ../lib/unistd.h:113:0, from css.c:4738: ../lib/getopt.h:196:8: error: redefinition of 'struct option' /usr/include/getopt.h:54:8: note: originally defined here ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long' /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' was here ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only' /usr/include/getopt.h:71:5: note: previous declaration of 'getopt_long_only' was here However, I can successfully build wget 1.13 on the same system under the same conditions. (Please CC as I am not subscribed to the list). Thanks in advance for the help. Jack
Re: [Bug-wget] getopt/'struct options' build error in 1.13.1
to facilitate the testing, I have uploaded a tarball here: http://it.gnu.org/~gscrivano/files/wget-1.13.1-dirty.tar.bz2 a263e18bc121d6195b1cf7c78b0ff0ba62ac09c3 wget-1.13.1-dirty.tar.bz2 2ee94ef1011dfea2c98615df0d59b7d1 wget-1.13.1-dirty.tar.bz2 Thanks, Giuseppe Perry Smith pedz...@gmail.com writes: Do I need all the autoconf stuff for this? I made the change but the Makefile didn't reflect the changes. On Aug 17, 2011, at 9:29 AM, Giuseppe Scrivano wrote: Yes, but it seems to create another problem under Mac OS X 10.6.8. In any case, this is the hack I was talking about, does it work for both of you? Thanks, Giuseppe === modified file 'src/Makefile.am' --- src/Makefile.am 2011-08-11 08:26:43 + +++ src/Makefile.am 2011-08-17 14:15:58 + @@ -39,9 +39,12 @@ DEFS = @DEFS@ -DSYSTEM_WGETRC=\$(sysconfdir)/wgetrc\ -DLOCALEDIR=\$(localedir)\ LIBS = @LIBICONV@ @LIBINTL@ @LIBS@ $(LIB_CLOCK_GETTIME) +noinst_LIBRARIES = libcss.a +libcss_a_SOURCES = css.l + bin_PROGRAMS = wget wget_SOURCES = cmpt.c connect.c convert.c cookies.c ftp.c \ - css.l css-url.c \ + css_.c css-url.c \ ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \ http.c init.c log.c main.c netrc.c progress.c ptimer.c \ recur.c res.c retr.c spider.c url.c\ @@ -57,6 +60,7 @@ LDADD = $(LIBOBJS) ../lib/libgnu.a AM_CPPFLAGS = -I$(top_builddir)/lib -I$(top_srcdir)/lib + ../lib/libgnu.a: cd ../lib $(MAKE) $(AM_MAKEFLAGS) @@ -78,6 +82,10 @@ $(AM_LDFLAGS) $(LDFLAGS) $(LIBS) $(wget_LDADD)';' \ | $(ESCAPEQUOTE) $@ +css_.c: css.c +echo '#include wget.h' $@ +cat css.c $@ + check_LIBRARIES = libunittest.a libunittest_a_SOURCES = $(wget_SOURCES) test.c build_info.c test.h nodist_libunittest_a_SOURCES = version.c Perry Smith pedz...@gmail.com writes: I thought you were just going to remove the include of wget.h ? On Aug 17, 2011, at 9:09 AM, Giuseppe Scrivano wrote: ops... Thanks to have reported it. I am sure it depends from a fix for a similar error Perry had on AIX. At this point, it seems the only way to fix the problem is to include config.h at the very beginning of css.c. I have looked at the flex documentation but I can't find anything useful to prevent other files to be included before the C code snippet. Has anybody an idea? Should I go for an hack? Cheers, Giuseppe Jack Nagel jackna...@gmail.com writes: I have encountered an issue building wget 1.13.1 on Mac OS X 10.6.8. It fails during 'make' with gcc 4.2 here: /usr/bin/cc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -I. -I../lib -I../lib -c css.c In file included from ../lib/unistd.h:113:0, from css.c:4738: ../lib/getopt.h:196:8: error: redefinition of 'struct option' /usr/include/getopt.h:54:8: note: originally defined here ../lib/getopt.h:245:12: error: conflicting types for 'getopt_long' /usr/include/getopt.h:69:5: note: previous declaration of 'getopt_long' was here ../lib/getopt.h:249:12: error: conflicting types for 'getopt_long_only' /usr/include/getopt.h:71:5: note: previous declaration of 'getopt_long_only' was here However, I can successfully build wget 1.13 on the same system under the same conditions. (Please CC as I am not subscribed to the list). Thanks in advance for the help. Jack
Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available
Perry Smith pedz...@gmail.com writes: I took a stab at installing GNUTLS and gave up. The beauty of wget is I can get it going with very few things needed. I compiled without ssl at all but getting openssl going is fairly easy too. GNUTLS is asking for nettle, zlib, and something else (according to the web page) but then it snuck up and started asking for pkg-config. That is way down the list in my bring up sequence. there is no gnu tls package for your system? Is there need to compile everything? I guess... I don't get what is wrong with openssl. Why do we need GNUTLS at all? (we being the open source community.) Here[1] you can find a good explanation. OpenSSL is still supported, as the GNU TLS backend is not as mature as the OpenSSL, my hope is that pushing it by default will make things change in the future. If you have so many problems with GNU TLS, what is difficult about --with-ssl=openssl to configure? Thanks, Giuseppe 1) http://people.gnome.org/~markmc/openssl-and-the-gpl.html
Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available
Jochen Roderburg roderb...@uni-koeln.de writes: And in general they seem to want to steer away the users from openssl to gnutls and in order to do that the configure script doesn't even mention this option any longer. :-( And in the same vein the option --with-libssl-prefix has completely disappeared, which used to be helpful when you had your preferred ssl library in a non-standard place. Now you have to trick around with compiler options to achieve that. it is fixed in the current development version, and the fix will be included in the wget release I am going to do in the next few days. It was already reported on this mailing list some days ago, and it was the reason why wget 1.13 wasn't released :-) Cheers, Giuseppe
Re: [Bug-wget] wget-1.13 on AIX
Hello Perry, thanks to have reported it. Does it work correctly if you drop the #include wget.h line from css.l? === modified file 'src/css.l' --- src/css.l 2011-01-01 12:19:37 + +++ src/css.l 2011-08-12 15:18:23 + @@ -36,7 +36,6 @@ #define YY_NO_INPUT -#include wget.h #include css-tokens.h %} Thanks, Giuseppe Perry Smith pedz...@gmail.com writes: Hi, I've tried this on AIX 5.3 and 6.1. The problem is with src/css.c. In essence it is doing this: #include stdio.h #include string.h #include errno.h #include stdlib.h #include inttypes.h #define _LARGE_FILES #include unistd.h The #define of _LARGE_FILES is actually done in config.h via wget.h. I understand that AIX is very hard to deal with but this seems like a bad idea for any platform. If you are going to declare that you want _LARGE_FILE support, you need to do that before any system includes. What this causes is both _LARGE_FILES and _LARGE_FILE_API both get defined and that causes one place to declare (for example) #define ftruncate ftruncate64 (this is in unistd.h around line 733) and then later we have: extern int ftruncate(int, off_t); #ifdef _LARGE_FILE_API extern int ftruncate64(int, off64_t); #endif (around line 799) which the compiler complains about with: /usr/include/unistd.h:801: error: conflicting types for 'ftruncate64' /usr/include/unistd.h:799: error: previous declaration of 'ftruncate64' was here There are actually several pairs of these. With the above code snippet, if you move the #define to the top, (or completely remove it) the compile works fine. It just seems like it would be prudent to declare things like _LARGE_FILES in config.h (like you do) but put config.h as the first include of each file so that the entire code base knows which interface the program wants to use. What I did was to move css.c to _css.c. I put an #ifndef _CONFIG_H wrapper inside config.h and then the new css.c was simply: #include config.h #include _css.c and that worked for my 5.3 system. I have not tried it on my 6.1 system yet. I hope this helps someone. Thank you, pedz
Re: [Bug-wget] WARC output
Gijs van Tulder gvtul...@gmail.com writes: It would be cool if Wget could become one of these tools. Already the Swiss army knife for mirroring websites, the one thing that Wget is missing is a good way to store these mirrors. The current output of --mirror is not sufficient for archival purposes: Sure we do! With some help from others, I've added WARC functions to Wget. With the --warc-file option you can specify that the mirror should also be written to a WARC archive. Wget will then keep everything, including Can you please track all contributors? Any contribution to GNU wget requires copyright assigments to the FSF. Do you think this is something that could be included in the main Wget version? If that's the case, what should be the next step? Sure, I will take a look at the code in the next days. In the meanwhile, can you check if you are following the GNU Coding Standards for the new code[1]? The implementation makes use of the open source WARC Tools library (Apache License 2.0): http://code.google.com/p/warc-tools/ how much code is really needed from that library? I wonder if we can avoid this dependency at all. Cheers, Giuseppe 1) http://www.gnu.org/prep/standards/
Re: [Bug-wget] gnutls link failure, ssl
Hello Karl, thanks to have reported it. It looks like a very ugly one, I think it depends from last change: revno: 2517 committer: Giuseppe Scrivano gscriv...@gnu.org branch nick: wget timestamp: Fri 2011-08-05 21:36:08 +0200 message: gnutls: do not use a deprecated function. I'll rollback to the deprecated function when `gnutls_priority_set_direct' is not available. I will amend your comments into the NEWS file and configure --help. I think it is too late now to replace packages, and to avoid synchronization problems with mirrors, I'll go for 1.13.1. I had the feeling that 1.13 wasn't going to be released :-) Thanks, Giuseppe k...@freefriends.org (Karl Berry) writes: My initial build of wget failed due to gnutls version problems. configure said: .. checking for main in -lgnutls... yes configure: compiling in support for SSL via GnuTLS But then the link failed with: gcc -O2 -Wall -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o retr.o spider.o url.o utils.o exits.o build_info.o iri.o version.o ftp-opie.o gnutls.o ../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz -lidn -lrt gnutls.o: In function `ssl_connect_wget': gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct' gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct' collect2: ld returned 1 exit status Evidently configure should check for gnutls_priority_set_direct also. And if it fails, hopefully it will fall back to openssl. (This was on CentOS 5.6, but presumably that doesn't especially matter.) Related, there used to be an option --with-libssl-prefix. I'm not sure when it was removed, but it was useful. Also, configure --help does not mention the possibility of --with-ssl=openssl. Finally, the NEWS file doesn't say anything about either of these: preferring tls to openssl or the --with-ssl=openssl option. I didn't look to see if there were other configure options that didn't make to the --help and/or NEWS. Thanks, Karl
Re: [Bug-wget] Bug in processing url query arguments that have '/'
Peng Yu pengyu...@gmail.com writes: I was looking at the patched version. (See the patch posted in bug #31147) So I think that the bug in the patch (see the relevant code below, where full_file has the query string). I guess for full_file a different 'acceptable' function should be used. if (opt.match_query_string) full_file = concat_strings(u-file, ?, u-query, (char *) 0); if (!acceptable (full_file)) { DEBUGP ((%s (%s) does not match acc/rej rules.\n, url, full_file)); goto out; } } I am inclined to don't add more options to the current Accept/Reject rules, as I think they are not flexible enough and quite tricky. It is better to support a more generic way to specify these rules. Cheers, Giuseppe
Re: [Bug-wget] Bug in processing url query arguments that have '/'
Hello Peng, AFAICS, `s' is a path, so '/' in the query string is escaped and `acceptable' doesn't see it. As for your example: http://xxx.org/somescript?arg1=/xxy `s' in this case will be something like: xxx.org/somescript?arg1=%2Fxxy Do you have any example where it doesn't work? Cheers, Giuseppe Peng Yu pengyu...@gmail.com writes: Hi, The following line is in utils.c. # in acceptable (const char *s) while (l s[l] != '/') --l; if (s[l] == '/') s += (l + 1); It essentially gets a substring after the last '/'. However, when a query has '/', this is problematic. For example, the above code snip will extract '/xxy' instead of 'somescript?arg1=/xxy'. I think that the above code should add the test of the position of '?'. If there is a '?', it should look for the last '/' before '?'. Is it the case? http://xxx.org/somescript?arg1=/xxy
Re: [Bug-wget] next wget release?
Noël Köthe n...@debian.org writes: I don't want to pester with this question but when is the next wget release planed? 1.12 was released 2009-09-22 and since then there were some bugfixes and patches integrated in the VCS but they do not reach the user. I have just uploaded another test version. ftp://alpha.gnu.org/gnu/wget/wget-1.12-2523.tar.bz2 and the detached GPG signature (using the key C03363F4): ftp://alpha.gnu.org/gnu/wget/wget-1.12-2523.tar.bz2.sig Unless there will be reports like I have lost my home directory when I specify recursive download, I will release it in the next few days. Have fun! Giuseppe
Re: [Bug-wget] next wget release?
Jochen Roderburg roderb...@uni-koeln.de writes: --- ./src/host.c.orig 2011-08-06 16:45:59.0 + +++ ./src/host.c2011-08-06 19:49:41.0 + @@ -829,7 +829,7 @@ int printmax = al-count; if (! opt.show_all_dns_entries) -printmax = 3; +if (printmax 3) printmax = 3; Thanks, applied! Regards, Giuseppe
Re: [Bug-wget] Quotes get striped in cookie values
Hello Nirgal, thanks to have reported it. I am not sure it is really wrong to omit quotes but in any case I am going to apply this patch: === modified file 'src/cookies.c' --- src/cookies.c 2011-01-01 12:19:37 + +++ src/cookies.c 2011-08-02 20:53:42 + @@ -350,6 +350,13 @@ goto error; if (!value.b) goto error; + + /* If the value is quoted, do not modify it. */ + if (*(value.b - 1) == '') +value.b--; + if (*value.e == '') +value.e++; + cookie-attr = strdupdelim (name.b, name.e); cookie-value = strdupdelim (value.b, value.e); Cheers, Giuseppe Nirgal Vourgère jmv_...@nirgal.com writes: Hello When server sends header: Set-Cookie: SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw==; Version=1; Path=/ wget sends afterward: Cookie: SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw== while it should be sending: Cookie: SSOCOOKIECC=L2ZS6azH5Mc4dwO/601i9QgGInPjnaaCeQWLTQbV3JD+RbT1Ryw/6ahTJS+boW94I86y3k62U1iIOOXv3cqPxw== Curl and Iceweasel works fine with that kind of cookies. That feature was originally repported on Debian bug tracking system at: http://bugs.debian.org/587033 I am no longer using that web site, and I had switched to curl anyways when I did, so I don't really need a fix. But I lost many hours on that problem, and if someone could have a look, it might save other people some time in the future.
Re: [Bug-wget] How to just download cookies?
Peng Yu pengyu...@gmail.com writes: Hi, I use the following code to download the cookies. But it will always download some_page. Is there a way to just download the cookies? wget --post-data='something' --directory-prefix=/tmp --save-cookies=cookies_file --keep-session-cookies http://xxx.com/some_page /dev/null Probably what you want in your command is -O/dev/null, or -O- /dev/null. Cheers, Giuseppe
Re: [Bug-wget] How to download all the links on a webpage which are in some directory?
Peng Yu pengyu...@gmail.com writes: Suppose I want download www.xxx.org/somefile/aaa.sfx and the links therein (but restricted to the directory www.xxx.org/somefile/aaa/) I tried the option '--mirror -I /somefile/aaa', but it only download www.xxx.org/somefile/aaa.sfx. I'm wondering what is the correct option to do so? it looks like the right command. Can you check using -d what is going wrong? Cheers, Giuseppe
Re: [Bug-wget] next wget release?
Hi Jan, $ ldd ./wget linux-gate.so.1 = (0xb781d000) libssl.so.1.0.0 = /usr/lib/i686/cmov/libssl.so.1.0.0 (0xb77b7000) libcrypto.so.1.0.0 = /usr/lib/i686/cmov/libcrypto.so.1.0.0 (0xb7609000) libdl.so.2 = /lib/i386-linux-gnu/i686/cmov/libdl.so.2 (0xb7604000) libz.so.1 = /usr/lib/libz.so.1 (0xb75f) libidn.so.11 = /usr/lib/i386-linux-gnu/libidn.so.11 (0xb75be000) librt.so.1 = /lib/i386-linux-gnu/i686/cmov/librt.so.1 (0xb75b5000) libc.so.6 = /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xb745b000) /lib/ld-linux.so.2 (0xb781e000) libpthread.so.0 = /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 (0xb7441000) Please note that by default the new wget version will use the GNU TLS backend instead of OpenSSL, the long term plan is to drop completely OpenSSL. That error doesn't appear on both back-ends now. Cheers, Giuseppe Jan Thomas jatho...@redhat.com writes: Hey Giuseppe, That's great. Can you do a 'ldd wget' and tell me which libs it's linked against? I built the last wget openssl-devel in fedora 14 , and it's working, but built against rhel 5 it still fails. [Fedora]$ ldd wget linux-vdso.so.1 = (0x7fffc2cd7000) libssl.so.10 = /usr/lib64/libssl.so.10 (0x00393380) libcrypto.so.10 = /lib64/libcrypto.so.10 (0x00394d00) libdl.so.2 = /lib64/libdl.so.2 (0x003eb120) librt.so.1 = /lib64/librt.so.1 (0x003eb220) libc.so.6 = /lib64/libc.so.6 (0x003eb0e0) libgssapi_krb5.so.2 = /lib64/libgssapi_krb5.so.2 (0x00393300) libkrb5.so.3 = /lib64/libkrb5.so.3 (0x00393340) libcom_err.so.2 = /lib64/libcom_err.so.2 (0x003ebce0) libk5crypto.so.3 = /lib64/libk5crypto.so.3 (0x00393280) libz.so.1 = /lib64/libz.so.1 (0x003eb260) /lib64/ld-linux-x86-64.so.2 (0x003eb0a0) libpthread.so.0 = /lib64/libpthread.so.0 (0x003eb160) libkrb5support.so.0 = /lib64/libkrb5support.so.0 (0x003932c0) libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x003ebe60) libresolv.so.2 = /lib64/libresolv.so.2 (0x003eb3e0) libselinux.so.1 = /lib64/libselinux.so.1 (0x003eb2e0) [rhel5]# ldd wget linux-vdso.so.1 = (0x7fffc4377000) libssl.so.6 = /lib64/libssl.so.6 (0x003f9220) libcrypto.so.6 = /lib64/libcrypto.so.6 (0x003f8fe0) libdl.so.2 = /lib64/libdl.so.2 (0x003f8500) librt.so.1 = /lib64/librt.so.1 (0x003f85c0) libc.so.6 = /lib64/libc.so.6 (0x003f8480) libgssapi_krb5.so.2 = /usr/lib64/libgssapi_krb5.so.2 (0x003f9020) libkrb5.so.3 = /usr/lib64/libkrb5.so.3 (0x003f91a0) libcom_err.so.2 = /lib64/libcom_err.so.2 (0x003f8e60) libk5crypto.so.3 = /usr/lib64/libk5crypto.so.3 (0x003f90a0) libz.so.1 = /usr/lib64/libz.so.1 (0x003f8580) /lib64/ld-linux-x86-64.so.2 (0x003f8440) libpthread.so.0 = /lib64/libpthread.so.0 (0x003f8540) libkrb5support.so.0 = /usr/lib64/libkrb5support.so.0 (0x003f9060) libkeyutils.so.1 = /lib64/libkeyutils.so.1 (0x003f90e0) libresolv.so.2 = /lib64/libresolv.so.2 (0x003f8ac0) libselinux.so.1 = /lib64/libselinux.so.1 (0x003f8640) libsepol.so.1 = /lib64/libsepol.so.1 (0x003f8600) So, I think the bug is in the older version of openssl and not in wget. regards, s pozdravem, Jan G Thomas jatho...@redhat.com - Original Message - From: Giuseppe Scrivano gscriv...@gnu.org To: Jan Thomas jatho...@redhat.com Cc: bug-wget@gnu.org Sent: Monday, July 25, 2011 12:24:44 PM Subject: Re: [Bug-wget] next wget release? hey Jan, this is what I get using the last development version of wget. $ LANG=en ./wget -O/dev/null https://github.com/rg3/youtube-dl/raw/2011.01.30/youtube-dl --2011-07-25 12:23:29-- https://github.com/rg3/youtube-dl/raw/2011.01.30/youtube-dl Resolving github.com (github.com)... 207.97.227.239 Connecting to github.com (github.com)|207.97.227.239|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.github.com/rg3/youtube-dl/2011.01.30/youtube-dl [following] --2011-07-25 12:23:30-- https://raw.github.com/rg3/youtube-dl/2011.01.30/youtube-dl Resolving raw.github.com (raw.github.com)... 207.97.227.243 Connecting to raw.github.com (raw.github.com)|207.97.227.243|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 93827 (92K) [text/plain] Saving to: `/dev/null' 100%[==] 93,827 305K/s in 0.3s 2011-07-25 12:23:32 (305 KB/s) - `/dev/null' saved [93827/93827] Cheers, Giuseppe Jan Thomas jatho...@redhat.com writes: Ciao Giuseppe, Great
Re: [Bug-wget] Bug in WGET?
Patrick Steil patr...@churchbuzz.org writes: Also, if I use wget in spider mode, it will at the end of the log output tell me about all the broken links... but I also need to know what page those broken links are created on (if the broken link) is on the site I am getting... this will help me find the 404 on my site... I have a vision for how this should work to make it awesome... Any way to do that, or anyone want to add this functionality? I don't think it is possible at the moment, but add this feature shouldn't take much time. The feature seems interesting but I don't think it is going to be implemented before the next release. You can wait until someone is going to implement it, or you can take advantage of the fact wget is Free software and implement it by yourself or hire someone to do it for you. Cheers, Giuseppe
Re: [Bug-wget] Bug in WGET?
Hello, Patrick Steil patr...@churchbuzz.org writes: If I run this command: wget www.domain.org/news?page=1 options= -r --no-clobber --html-extension --convert-links -np --include-directories=news Here is what it does today: 1. When --html-extension is turned on, the --noclobber is not changing the name of the downloaded files, but it DOES rewrite the file as the date/time stamp changes every time I run the above command. I couldn't reproduce it. I have `strace'd but I can't see any syscall which could modify the time stamp. Can you please attach the strace and the wget debug log? You can get it by: strace -o strace.log wget args -d -o wget.log 2. If I turn off --html-extension, then as soon as WGET sees that the first file has already been downloaded it stops and does not continue to spider/download any further pages. AFAICS, the behaviour you get using --no-clobber and -r is documented, and it should work exactly as you described it (a newer version is ignored). The old version is still traversed for links. Cheers, Giuseppe
Re: [Bug-wget] wget 1.12 generates duplicated contents
Hello, I couldn't reproduce the problem here, I get the same content I get with the browser. Does it behave differently if you use a recursive download or if you request a single page? Does it happen everytime? If you are able to reproduce it, can you please post the output you get running wget with --debug, otherwise please attach the content of index.html. Thanks, Giuseppe Anh Ta a...@squiz.co.uk writes: Hi, I ran the following command with wget 1.12: wget -r -l 1 -E -k -nv --wait=0.5 --random-wait http://www.beds.ac.uk The downloaded file www.beds.ac.uk/index.html (zip file attached ) contained duplicated footer. When I ran with greater depth level, e.g. -l 15 and -p option, there were more pages with duplicated footers. The problem disappeared when I ran the same command with wget 1.11.4 . However, I need version 1.12 to have links in CSS downloaded and replaced. Could someone please help or give me some advices? Many Thanks, Anh
Re: [Bug-wget] Wget and missing cookies
Hello, how are you invoking wget? Do you see something different in the http headers when you use --debug? Thanks, Giuseppe Richard van Katwijk rich...@three6five.com writes: Hi, I am using the firefox plugin 'httpfox' to trace the sending and receiving of cookies between my browser and the web server. I can see cookies initially being received by the browser, and then subsequently being sent back to the server on further page requests. However, simulating the same, simple request with wget (using -S and -d) I do *not* see these cookies being received. I have tested several sites that i know well - some do send the cookies to wget, but others dont, even though tools such as 'httpfox' do always show them, as expected. Is there any reason whey either wget wouldnt see cookies being sent, or maybe why the server would not send cookies to the wget user-agent? Thanks, Richard
Re: [Bug-wget] Wget authorization failed with --spider option
Can it be that the server allows GET but not HEAD? Can you attach the debug log without --spider as well? You can drop the payload if it is confidential :-) The request and the response headers matter. Thanks, Giuseppe Avinash pavin...@gmail.com writes: Hi , I am getting 'Authorization Failed' error on following URL with --spider option. whereas, it works and also downloads the file when I remove --spider option. My requirement is not to download it, but to read the server-response only. Anybody any idea as to why it is happening ? /usr/bin/wget --debug --server-response --spider http://172.20.241.55:/9/Acceptable%20Use/Confidential_Internal_Memos.docx--http-user=test --http-password=password s.docx --http-user=test --http-password=password Setting --server-response (serverresponse) to 1 Setting --spider (spider) to 1 Setting --http-user (httpuser) to test Setting --http-password (httppassword) to Recnex#1 DEBUG output created by Wget 1.10.2 (Red Hat modified) on linux-gnu. --11:11:43-- http://172.20.241.55:/9/Acceptable%20Use/Confidential_Internal_Memos.docx = `Confidential_Internal_Memos.docx' Connecting to 172.20.241.55:... connected. Created socket 3. Releasing 0x005416f0 (new refcount 0). Deleting unused 0x005416f0. ---request begin--- HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Authorization: Basic dGVzdDpSZWNuZXgjMQ== Host: 172.20.241.55: Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 401 Unauthorized Content-Length: 1656 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: Negotiate WWW-Authenticate: NTLM X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive ---response end--- HTTP/1.1 401 Unauthorized Content-Length: 1656 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: Negotiate WWW-Authenticate: NTLM X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive Registered socket 3 for persistent reuse. Disabling further reuse of socket 3. Closed fd 3 Empty NTLM message, starting transaction. Creating a type-1 NTLM message. Connecting to 172.20.241.55:... connected. Created socket 3. Releasing 0x005415a0 (new refcount 0). Deleting unused 0x005415a0. ---request begin--- HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Authorization: NTLM TlRMTVNTUAABAgIgACA= Host: 172.20.241.55: Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 401 Unauthorized Content-Length: 1539 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: NTLM TlRMTVNTUAACADgCAgACAugUTcnBdbk4BQLODg8= X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive ---response end--- HTTP/1.1 401 Unauthorized Content-Length: 1539 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: NTLM TlRMTVNTUAACADgCAgACAugUTcnBdbk4BQLODg8= X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive Registered socket 3 for persistent reuse. Disabling further reuse of socket 3. Closed fd 3 Received a type-2 NTLM message. Creating a type-3 NTLM message. Connecting to 172.20.241.55:... connected. Created socket 3. Releasing 0x005432a0 (new refcount 0). Deleting unused 0x005432a0. ---request begin--- HEAD /9/Acceptable%20Use/Confidential_Internal_Memos.docx HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Authorization: NTLM TlRMTVNTUAADGAAYAEQYABgAXABABAAEAEAARAB0AYIAAHRlc3TAT6OiQKrO+dHjEjlknU5AyFpl7cOFhxbwn8z4gcxySH43C9uoPx96OryCmJ3OKAU= Host: 172.20.241.55: Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 401 Unauthorized Content-Length: 1539 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: Negotiate WWW-Authenticate: NTLM X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive ---response end--- HTTP/1.1 401 Unauthorized Content-Length: 1539 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: Negotiate WWW-Authenticate: NTLM X-Powered-By: ASP.NET Date: Wed, 06 Jul 2011 05:55:12 GMT Connection: keep-alive Registered socket 3 for persistent reuse. Disabling further reuse of socket 3. Closed fd 3 Authorization failed.
Re: [Bug-wget] Question regarding WGET
are you executing wget from the c:\Windows\system32 directory? To prevent the file to be being written to the disk, you can specify -O NUL on the command line, never tried by myself but I remember it works under Windows. Giuseppe Itay Levin itay.le...@onsettechnology.com writes: I'm using it with the following notation: WGET http://www.mysite.com/a.aspx And I noticed that it downloads this page to c:\Windows\system32 folder that is being filling up with a.aspx, a.aspx.1 a.aspx.2 and so on... Are there any command line flags that I can use to prevent this files to being written to disk? Thanks, Itay Levin OnPage Priority Messaging Rise above the clutter(tm)! Get your FREE OnPage at: www.OnPage.com http://www.onpage.com/ Follow OnPage on Facebook http://www.facebook.com/OnPage Twitter http://www.twitter.com/On_Page !
Re: [Bug-wget] wget IDN support
Thanks to have reported these problems. I'll take a look at them in the next few days. Cheers, Giuseppe Merinov Nikolay kim.roa...@gmail.com writes: Current realisation of IDN support in wget not worked when system uses UTF-8 locale. Current realisation of function `url_parse' from src/url.c call `remote_to_utf8' from src/iri.c and set `iri-utf8_encode' to returned value. Function `remote_to_utf8' can return false in two cases: 1. They can not convert string to UTF-8 2. Source text is same as result text Second case appear when system use UTF-8 encoding. This can be fixed in several places: In src/url.c (url_parse) with adding comparing iri-orig_url with UTF-8 In src/iri.c (remote_to_utf8) with removing if (!strcmp (str, *new)) test at the end of function. Or in src/iri.c (remote_to_utf8) with replacing return status when result is same as input string. Last variant can be written like this: === modified file 'src/iri.c' --- src/iri.c 2011-01-01 12:19:37 + +++ src/iri.c 2011-06-23 16:34:10 + @@ -277,7 +277,7 @@ if (!strcmp (str, *new)) { xfree ((char *) *new); - return false; + *new = NULL; } return ret; Also it can be a good idea to fix src/host.c(lookup_host) with replacing usage `gethostbyname_with_timeout' by `getaddrinfo_with_timeout' and using AI_IDN flag, if wget compiled with glibc version 2.3.4 or newer. It can be helpful when wget compiled without iri support.
Re: [Bug-wget] wget without http
David H. Lipman dlip...@verizon.net writes: If you are using Mapped Drives, there is NO NEED to use WGET as there are plenty of OS utilities from XXCOPY to RoboCopy. though these tools have two problems, first of all they are not free. Second, as already reported, they don't follow HTML links. It could be a good idea to handle file:// as well. Giuseppe
Re: [Bug-wget] Question regarding WGET
Itay Levin sit...@gmail.com writes: no i didn't specify any output dir - so it by default created the files in c:\windows\system32 but still it could be the working directory where wget is executed. Giuseppe
Re: [Bug-wget] Question to a specific situation
Hello, d113803_0-m m...@rtinlochner.de writes: 150 Opened data connection. fertig. 113803.webhosting42.1blu.de/www/demos/.listing: Permission denied can you check your permissions on the /www/demos/ directory? Can you browse it? Cheers, Giuseppe
Re: [Bug-wget] wget to a folder
what version are you using? It seems to work well here: $ wget -q -P testdir ftp://alpha.gnu.org/gnu/wget/wget-1.12-2504.tar.bz2 ls testdir/ wget-1.12-2504.tar.bz2 Giuseppe Michele Prendin mich...@micheleprendin.com writes: Hello there, I'm facing issues to use wget -P to download to a folder eh wget -P testfolder http//www.google.com/downloadme.zip wget only works when i download the file in a folder after reaching it with cd folder i tried all i could / testfolder , testfolder/ full path, still facing the same problem any suggestion? thanks best regards MP
Re: [Bug-wget] wget to a folder
Michele Prendin mich...@micheleprendin.com writes: Thanks Giuseppe for the help, i fixed the issues upgrading wget (you udate wget( despite now i can save in the folder i want, i have another issue with the older wget when i was using wget www.google.com/popupfile.php the phpfile forwarded the file to be downloaded with the header (if the file forwarded was abc.gz, wget were able to store /abc.gz) with the new wget the download works, but instead of saving it with the name abc.gz he uses popupfile.php any solution to this? i couldnt find anything useful from the -help You should specify: --trust-server-names. You can find the reason why we have added it here: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2010-2252 Giuseppe
Re: [Bug-wget] wget fails to encode spaces in URLs
Hi Volker, I see it now, thanks. This small patch makes sure the url is parsed in any case. Cheers, Giuseppe === modified file 'src/retr.c' --- src/retr.c 2011-06-05 12:31:24 + +++ src/retr.c 2011-06-08 09:29:20 + @@ -1005,9 +1005,7 @@ break; } - /* Need to reparse the url, since it didn't have iri information. */ - if (opt.enable_iri) - parsed_url = url_parse (cur_url-url-url, NULL, tmpiri, true); + parsed_url = url_parse (cur_url-url-url, NULL, tmpiri, true); if ((opt.recursive || opt.page_requisites) (cur_url-url-scheme != SCHEME_FTP || getproxy (cur_url-url))) Volker Kuhlmann list0...@paradise.net.nz writes: Hi Giuseppe, Thanks! I compiled it with libproxy: same problem. I then compiled it with just ./configure --prefix=/tmp/.../ make ./src/wget -i- http://downloads.sourceforge.net/project/bandwidthd/bandwidthd/bandwidthd 2.0.1/bandwidthd-2.0.1.tgz?r=ts=1307308092use_mirror=transact ^D (note the space after bandwidthd) and wireshark gives me: GET /project/bandwidthd/bandwidthd/bandwidthd 2.0.1/bandwidthd-2.0.1.tgz?r=ts=1307308092use_mirror=transact HTTP/1.1 User-Agent: Wget/1.12-2504 (linux-gnu) Accept: */* Host: downloads.sourceforge.net Connection: Keep-Alive Sorry NOT FIXED. My system and user wgetrc contain prefer-family = none use_proxy = off dirstruct = on timestamping = on dot_bytes = 64k dot_spacing = 10 dots_in_line = 50 backup_converted = on Volker
Re: [Bug-wget] Issue with TOMCAT SSL server wget
please keep the mailing list CC'ed in your replies. It seems the server doesn't accept the client certificate. Are you sure the cert.pem certificate is included in keystore.jks? Giuseppe brad bruggemann bradley.bruggem...@gmail.com writes: Giuseppe, There's a correction to my original post. The output that I get when I run the original command (with secure-protocol) is: OpenSSL: error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate When I run it without secure-protocol i get: OpenSSL: error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert unexpected message On Wed, Jun 8, 2011 at 7:08 AM, Giuseppe Scrivano gscriv...@gnu.org wrote: brad bruggemann bradley.bruggem...@gmail.com writes: Use wget to grab file: wget --secure-protocol=TLSv1 --certificate-type=PEM --certificate=/ path.to/cert.pem --password= https://IP_ADDRESS:1234/file.txt -o /tmp/file.txt what does happen if you don't specify --secure-protocol? Cheers, Giuseppe
Re: [Bug-wget] wget fails to encode spaces in URLs
Hi Volker, thanks to have reported this bug but it was fixed in the development version of wget and the fix will be included in the next release. Can you please confirm if it works for you? You can fetch a source tarball here: ftp://alpha.gnu.org/gnu/wget/wget-1.12-2504.tar.bz2 Thanks, Giuseppe Volker Kuhlmann list0...@paradise.net.nz writes: wget --version GNU Wget 1.12 built on linux-gnu. To reproduce: Go to any sourceforge project and download a file whos URL contains a space. Copy the direct link from the download page into wget -i- Run wireshark and press ^D in the wget input stream. If the upstream strips spaces (e.g. squid, default setting in pfsense) the download goes round in circles. The bug does not exist in wget when passing the URL on the command line. I always use -i- because of all the shell crud in URLs. I am using the openSUSE 11.4 version, but the only source code change is additional support for libproxy. Problem: Looking at the source, in main.c url_parse() is called for each URL from the command line. For -i, it calls retrieve_from_file(). retrieve_from_file() (in retr.c) reads a list of URLs from the given file. It then calls url_parse() only if IRI is enabled (which in my version of wget is not even compiled in). Hence the URL is never parsed and never encoded before being downloaded with retrieve_url(). That's a bug. The fix is probably to always call url_parse() in retrieve_from_file(), and not only when IRI is turned on. If this goes to a mailing list, please cc me on replies, I am not subscribed. Thanks, Volker
Re: [Bug-wget] wget
I doubt it will work with a recent version of wget. Anyway, I suggest you to take an older version (something like 1.10.2) and apply the patch using GNU patch, once the source code is patched you can build it and get the wget executable. Cheers, Giuseppe Dale Egan d...@leemyles.com writes: How can I delete a file on remote server after I down load it. I am down loading a couple files and need to del right after download before more files are added. Command I am using is (wget -r -nd ftp://name:passw...@something.com). I am using (GNU Wget 1.11.4 Red Hat modified) on centos 5.5. I have found a patch at (http://osdir.com/ml/web.wget.patches/2005-09/msg5.html ) Look like it would work but I do not understand how to install it. Thanks Dale
Re: [Bug-wget] Recursive wget with URL filter/under certain (non-parent) directory?
Yang Zhang yanghates...@gmail.com writes: I mentioned --include-directories in my original email. I couldn't figure out how to use it to this effect. Could you demonstrate? have you already tried the following one? wget -r -I /host/foo/ http://host/foo/bar/baz/index.cgi?page=1 Giuseppe
Re: [Bug-wget] Recursive wget with URL filter/under certain (non-parent) directory?
Micah Cowan mi...@cowan.name writes: have you already tried the following one? wget -r -I /host/foo/ http://host/foo/bar/baz/index.cgi?page=1 Shouldn't that be just -I /foo/ ? Yeah, sure :-) Thanks, Giuseppe
Re: [Bug-wget] [PATCH] set exit code to 1 if invalid host name specified
Hi Daniel, thanks for your contribution! I have pushed your first patch. I will wait for your copyright assignments before push the patch with the new tests. Thanks again, Giuseppe Daniel Manrique dan...@tomechangosubanana.com writes: Hi Giuseppe, I've started the assignment process, to at least get the ball rolling, even if it's not complete in time for the new release. I've also made the changes you suggested to coding style, and split the changes into two patches. Thanks so much for your help and suggestions! Do let me know if more changes are needd. Regards, - Daniel On Sat, Apr 23, 2011 at 10:45 AM, Giuseppe Scrivano gscriv...@gnu.org wrote: Thanks for the patch. It looks ok but in order to apply it, you need to complete the copyright assignments process to the FSF. We are very quite close to have a wget release and I doubt the FSF will receive your assignments before it. Can you please divide your patch in two? Keep changes to the source code in one patch and the new tests in another. Please keep the GNU coding style: Daniel Manrique dan...@tomechangosubanana.com writes: === modified file 'src/html-url.c' --- src/html-url.c 2011-01-01 12:19:37 + +++ src/html-url.c 2011-04-23 00:48:22 + @@ -810,6 +810,7 @@ file, url_text, error); xfree (url_text); xfree (error); + inform_exit_status(URLERROR); Please maintain the GNU coding style: inform_exit_status (URLERROR); Cheers, Giuseppe # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: roa...@tomechangosubanana.com-20110423193141-\ # iaihkimpxowwm0gh # target_branch: file:///home/roadmr/wget/trunk/ # testament_sha1: 3f2bdd4370318611a56293444fe3f320d8e39961 # timestamp: 2011-04-23 15:31:47 -0400 # base_revision_id: gscriv...@gnu.org-20110419124021-fi310a2hc7mz2j9y # # Begin patch === modified file 'src/ChangeLog' --- src/ChangeLog 2011-04-19 12:40:21 + +++ src/ChangeLog 2011-04-23 19:31:41 + @@ -1,3 +1,9 @@ +2011-04-21 Daniel Manrique roa...@tomechangosubanana.com + * main.c (main): Set exit status when invalid host name given in + command line. + * html-url.c (get_urls_file): Set exit status when invalid host + name given in input file. + 2011-04-19 Giuseppe Scrivano gscriv...@gnu.org * gnutls.c: Do not include fcntl.h. === modified file 'src/html-url.c' --- src/html-url.c2011-01-01 12:19:37 + +++ src/html-url.c2011-04-23 19:31:41 + @@ -810,6 +810,7 @@ file, url_text, error); xfree (url_text); xfree (error); + inform_exit_status (URLERROR); continue; } xfree (url_text); === modified file 'src/main.c' --- src/main.c2011-03-21 12:14:20 + +++ src/main.c2011-04-23 19:31:41 + @@ -1347,6 +1347,7 @@ char *error = url_error (*t, url_err); logprintf (LOG_NOTQUIET, %s: %s.\n,*t, error); xfree (error); + inform_exit_status (URLERROR); } else { @@ -1387,7 +1388,9 @@ if (opt.input_filename) { int count; - retrieve_from_file (opt.input_filename, opt.force_html, count); + int status; + status = retrieve_from_file (opt.input_filename, opt.force_html, count); + inform_exit_status (status); if (!count) logprintf (LOG_NOTQUIET, _(No URLs found in %s.\n), opt.input_filename); # Begin bundle IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWRZjdEsAAsLfgBEwUXf//14G mgCwUAV4ke6axYuSJTQNCSSJgp6bSniZNGmSg8BEaAxGI9QJSTSbBCemTSmQ0AACRIRq aaZAo9Twoeo9TIBpoMmRkyBzAJpgEyGAAJgmAAACSRMQ0IwTITQ9FT0nqPKPKMgAHqTRML+HX3S0 67LOA2zs2H3N++u6sO4mwtnZ5N94BJee7g8lJO6yJaUlCKbu46ZP7S3KRVttpjWzqF5p1z4SMM8G muiVYiTUTrK14GGjsORgFF2f2CXWuWqGhnQaFb6uXJghBhUKqq2vWX+baldir5I6BQsJXgcrbft2 tXvMIsX9TKtFgwwvQRVmSjEqJ6rcaC9BiVxnkGmdoK48BeVXqmW5b8dtqBygWbdEIKOSBtAOaB3z DjoQdxTT0QJRwNaCqJqQwn6BjKS2CPGtmIqYwogLhFeLQZqRzQktpnuDpOSAmwiEd5SWmQSJXPfH NrZo+YMZo3CJQyPvP03kUCM1lfCtYQEIWjTRUkZSo1iolTH4IH2MgN4S4EZDjOkQ8TuRrpegx1hf bUZsJ514E+H93hy+Ay3aYZXYF6BxNxdqEBDhEZQhIkUjiciArkIxe2M+eWsZdhXVEUvwxoF6SpYb C0JtI0dXPM1WrEC4pNcMjgcxMYbqi0K+A0Uxbk+1BQWUYWQfjW68YOLhpTEbbarMiy0YpAdXv5YX WK3puQ58DAZGRYXXsDOLqWgY6VTwaaB4xwSILG1pFqDB5eWNNILmZihBSmuF5I+olcihxY+6dEGR I4oKiT8oPB7i56Q2ExjeXGFpvade60TZYLyCDSeIfMzT3BZjVyzZJCiyyApCHYSiYWittdQ+HENC 9u29tmuIhvnTDm49SjpFvBW5U023+rb/VP9nQRMg7zwIKk7X2mYNWq5GmFApfSQigFk9HcMuYuvQ 7Gx0eQ9HlJ6uLuaFu5i9RGCPD5qCsVjAuYKOOxJepIL2OlKCJTIOlvZ3lhYciDUtJ3gsKAy+hwEM FqY6TFwd21+ce+Z8oatfPnsCxovBfPL+MkHUXSSqs56xxOZQZZlVKKD7ipF5RMilUeUPkbBU9Cj8 vfkMsBmscH06+fjmXQobHtEvXYq8jcw6Xhe9QC3SAocCerXS3wguJc9bbvNxQUkSpdMDJka7LMtl Gg/inlfmcLuX9UyUk5pJNIxyWjEx2EZx35B2h+Ap9k1XBMnZc2UBLC29XC8uPukcJjwzIPIRWuiD /BRBTF3oKBZsm2A/1wlxCMPhP0ETWLcfdtAPO45hhzEZGPmn2JM8lDsGoipChP2IeoUOFNn6nWLJ
Re: [Bug-wget] CNET download links not working with WGET
hello, the character in the url is interpreted by your shell. Try using something like: wget URL Cheers, Giuseppe Jeff Givens j...@sds.net writes: Hello, I am having an issue downloading files via download links from CNET. It appears to locate some of the URL but stops at the first siteId part. I have included the debug information as well. Thanks in advance for your help. C:\DOWNLOAD\wget http://dw.com.com/redir?edId=3siteId=4oId=300 0-8022_4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag =tdw_dltextltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLw oOYJQAABuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572 .html%3Fspi%3D077d9109e846975d0db9532bd610588f --2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://dw.com.com/redir/redx/?edId=3 [following] --2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. HTTP request sent, awaiting response... 404 Not Found 2011-04-19 11:30:36 ERROR 404: Not Found. 'siteId' is not recognized as an internal or external command, operable program or batch file. 'oId' is not recognized as an internal or external command, operable program or batch file. 'ontId' is not recognized as an internal or external command, operable program or batch file. 'spi' is not recognized as an internal or external command, operable program or batch file. 'lop' is not recognized as an internal or external command, operable program or batch file. 'tag' is not recognized as an internal or external command, operable program or batch file. 'ltype' is not recognized as an internal or external command, operable program or batch file. 'pid' is not recognized as an internal or external command, operable program or batch file. 'mfgId' is not recognized as an internal or external command, operable program or batch file. 'merId' is not recognized as an internal or external command, operable program or batch file. 'pguid' is not recognized as an internal or external command, operable program or batch file. 'destUrl' is not recognized as an internal or external command, operable program or batch file. DEBUG output created by Wget 1.11.4 on Windows-MSVC. --2011-04-19 11:27:09-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... seconds 0.00, 64.30.224.42 Caching dw.com.com = 64.30.224.42 Connecting to dw.com.com|64.30.224.42|:80... seconds 0.00, connected. Created socket 340. Releasing 0x01411158 (new refcount 1). ---request begin--- GET /redir?edId=3 HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: dw.com.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Found Date: Tue, 19 Apr 2011 15:27:26 GMT Server: Apache/2.0 Pragma: no-cache Cache-control: no-cache, must-revalidate, no-transform Vary: * Expires: Fri, 23 Jan 1970 12:12:12 GMT Set-Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg; expires=Sun, 18-Apr-2021 15:27:26 GMT; domain=.com.com; path=/ Location: http://dw.com.com/redir/redx/?edId=3 Content-Length: 0 P3P: CP=CAO DSP COR CURa ADMa DEVa PSAa PSDa IVAi IVDi CONi OUR OTRi IND PHY ONL UNI FIN COM NAV INT DEM STA Keep-Alive: timeout=363, max=760 Connection: Keep-Alive Content-Type: text/plain ---response end--- 302 Found Registered socket 340 for persistent reuse. cdm: 1 2 3 4 5 6 7 8 Stored cookie com.com -1 (ANY) / permanent insecure [expiry 2021-04-18 11:27:26] XCLGFbrowser Cg5iVk2tqd6J8Sg Location: http://dw.com.com/redir/redx/?edId=3 [following] Skipping 0 bytes of body: [] done. --2011-04-19 11:27:09-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. Reusing fd 340. ---request begin--- GET /redir/redx/?edId=3 HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: dw.com.com Connection: Keep-Alive Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 404 Not Found Date: Tue, 19 Apr 2011 15:27:26 GMT Server: Apache/2.0 Content-Length: 209 Keep-Alive: timeout=363, max=779 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 ---response end--- 404 Not Found Skipping 209 bytes of body: [!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN htmlhead title404 Not Found/title /headbody h1Not Found/h1 pThe requested URL /redir/redx/ was not found on this server./p /body/html ] done. 2011-04-19 11:27:09 ERROR 404: Not Found.
Re: [Bug-wget] --mirror sometimes ignores -np
Hi Mojca, it was already reported here: http://savannah.gnu.org/bugs/index.php?20519 On the same page you can find an explanation why it behaves this way. Cheers, Giuseppe Mojca Miklavec mojca.miklavec.li...@gmail.com writes: Dear list, when I try to run wget -np --mirror --progress=bar -nH --cut-dirs=1 -erobots=off --reject=index.html* http://www.w32tex.org/docs the command will try to fetch files from /icons and all other folders despite the -np switch. The behaviour gets fixed if I use http://www.w32tex.org/docs/; (if I add trailing slash), but I still find it very weird and I think that this should not happen. Indeed it doesn't happen on some other websites that I tried, but I'm not sure about the exact recipe to reproduce the behaviour. Mojca
Re: [Bug-wget] [PATCH] Allow openSSL compiled without SSLv2
Thanks for the patch. Committed and pushed. Cheers, Giuseppe Cristian Rodríguez crrodrig...@opensuse.org writes: Hi: the attached patch adds support to an openSSL library compiled without SSlv2 , in which case, wget will behave like if it was using the GNUTLS backend, that is, doing sslv3 only. # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: cristian@linux-us4g-20110411021140-k71ctv0bcygv05mj # target_branch: bzr://bzr.savannah.gnu.org/wget/trunk/ # testament_sha1: 0b8aab4ce061b99614d52e9fa063e5f604cd0124 # timestamp: 2011-04-10 23:25:17 -0300 # base_revision_id: gscriv...@gnu.org-20110407105651-ofq3ntt3w0h6zkq9 # # Begin patch === modified file 'src/openssl.c' --- src/openssl.c 2011-04-04 14:56:51 + +++ src/openssl.c 2011-04-11 02:11:40 + @@ -187,8 +187,10 @@ meth = SSLv23_client_method (); break; case secure_protocol_sslv2: +#ifndef OPENSSL_NO_SSL2 meth = SSLv2_client_method (); break; +#endif case secure_protocol_sslv3: meth = SSLv3_client_method (); break; # Begin bundle IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWTL7pGAAAXhfgAAQWGf/91Kl zgCwUANa5u9avddMdo4aIk8qeTyJMw00mU9CZM1Hqep6j1DCAZRDTGppTeEp+qaaaAaA00aA DQEkhoQJpplJ6eqemphD1DQyGmQNAj1JT0R6INGhoZAAElEwqeKekeSeUeFNDTRtINABo09T DebO3wy7zPm0CmOnswQYUnz1fe8Kyy7YsMx4fQPzKYzAYzmk0qwlXzef7myR73QL0ZRdEkMy1SVa p9NXxxJBkHC3NNLAdUE+ksVJCKypafCYeTue0SpBPkjoX3wWGRCFrWkaxiGMYGFzEYKgjnDD4g1G k5UeUVhceQcTP1WfSf1bAky2PgHS6fucUBhFq+W86/U+YrBFCVG5i181Uw2jYgLCm5LTc0alGySm 16BMGMq9Vj+HfQpLEqOVw4dhgraqguKSqiUcxPKIzW4E8DgGXNv/T0mIYDADSFlZNMKi1Mpij+IB gxHYx8ch2kzzHgB9ejROs4c5QoXzFF5Yq8ImjQlkysbdcclNy1ysIRIFsM4Swta12Ly7GAdHeo0W hUNRfboKBL5qqNAtxndeDZcFYtz7FW7bGFbZBCd8CcE02kVClGTPxGuSHhqSYrx5eoCiJUJgzPJH RmzAdTL8XzV17XLIN7nwN7YRhRWwNmyQsGhl590SOsNs0zQJNoeUqMlw98d+3eLOBFp44P4bELIY lqiDgxgjF+DBRSdBnJcR8hcDXu4wHA1BzJS6qKZCIdFx1zQ4Z7tryobFDG2iuQaL/Cvwx2pR3zZ0 TYb/GWXeaIlANz0DGh9MchtfC7KSI9HVAfP2QXkKSMmOjtVVrqD4QFkmBv9bTO4TPGwP+aQhg3V7 U2IGqxMrDpS60fUitWQQC5bhZzjTOcL3wmbSDmk7AOxtO73TH6VqxtXg85tYZsopYVOzhHEKQOTt aD5ZBQsKhZtPQ6sMVMrQYE7QgpsEtFNj46FfUo1wM+Qut3OacZJksYYxjIOEHjODhICTLsWAREUa A2yIp1ATNUd95poeB7ANLzSVcu5zKDMkCAgt8EJdifXVaKIfDa6K8cCYclZ1WgpLdHF5XLuUFHXJ Ks+IPs4MXVBpqi6LfTareKvhaBcjOV3VS9PD+DtY+hQGtrhUDjeXxC8JMKmgoBFbd+tNEu++dUsw Yg+ogBmuMwOcymNMjQlhysOQMimA1m//F3JFOFCQMvukYA==
Re: [Bug-wget] Wget segfaults on malformed HTTP status line
thanks for the bug report, it is already fixed in the development version. The fix will be included in the next wget release. Cheers, Giuseppe Vitaly Minko vitaly.mi...@gmail.com writes: Hi all, I get segmentation fault when HTTP server returns malformed status line (without a status code). Use the following command to reproduce the issue: `wget vminko.org:8081/test` Wget crashes because the HTTP daemon returns just HTTP/1.0\n\n (see wget-test.pl). The proposed fix is attached (wget-1.12-http-status-line.patch). Best regards, Vitaly
Re: [Bug-wget] How do I tell wget not to follow links in a file?
David Skalinder da...@skalinder.net writes: I want to mirror part of a website that contains two links pages, each of which contains links to many root-level directories and also to the other links page. I want to download recursively all the links from one links page, but not from the other: that is, I want to tell wget download links1 and follow all of its links, but do not download or follow links from links2. I've put a demo of this problem up at http://fangjaw.com/wgettest -- there is a diagram there that might state the problem more clearly. This functionality seems so basic that I assume I must be overlooking something. Clearly wget has been designed to give users control over which files they download; but all I can find is that -X controls both saving and link-following at the directory level, while -R controls saving at the file level but still follows links from unsaved files. why doesn't -X work in the scenario you have described? If all links from `links2' are under /B, you can exclude them using something like: wget -r -Xwgettest/B http://fangjaw.com/wgettest Cheers, Giuseppe
Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2
Ray Satiro raysat...@yahoo.com writes: Hi, It is still an issue that wget/openssl combo is broken in windows. I have uploaded a new tarball: ftp://alpha.gnu.org/gnu/wget/wget-1.12-2474.tar.bz2 Can you please check if it works well for you now? OpenSSL should work well now under Windows, but I am not sure about the configure stuff. Thanks, Giuseppe
Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2
Ray Satiro raysat...@yahoo.com writes: Anything in OpenSSL that tries to write to a socket will fail because it's passed a fd and not a socket. For example sock_write() in openssl's crypto/bio/bss_sock.c:153 calling send() and passing a fd will cause an error of WSAENOTSOCK. It shouldn't happen. If you look at openssl.c:401, we register the socket on Windows, not the fd. I am just guessing it should work but I don't have a Windows machine where I can check it by myself. Another thing is the configure test for openssl is still using ssl and crypto libs configure:22076: gcc -o conftest.exe -O2 -Wall conftest.c -lssl -lcrypto 5 but on windows you want -lssl -lcrypto -lws2_32 -lgdi32 As I mentioned at some other point in time what you'd expect is shared libs when building. Unfortunately a similar test for that will fail if the actual dll is not in the path. Would it be better for just an AC_CHECK_LIB on eay32 and ssl32? I have pushed some patches to do it. Can you please try with the development version if something is improved? I have cross compiled to mingw without problems, I have obtained OpenSSL for mingw using mingw-cross-env[1], which saved me from the burden of cross-compiling it. Another thing re ipv6 support: host.c: In function 'getaddrinfo_with_timeout_callback': host.c:383:3: warning: implicit declaration of function 'getaddrinfo' host.c: In function 'lookup_host': host.c:787:5: warning: implicit declaration of function 'freeaddrinfo' In windows ws2tcpip.h should be included in addition to winsock2.h. Some headers for ws2tcpip.h have the winsock2.h include some don't. The order is # include winsock2.h # include ws2tcpip.h When ipv6 is enabled _WIN32_WINNT should be defined = 0x0501 (WinXP) before includes. This means wget with ipv6 will not work on win2000. There's a solution for this but it requires rewriting code that is copyrighted microsoft for a getaddrinfo wrapper, unless someone has already done this. Is windows 2000 support still wanted? I have one request from last year but other than that I don't hear about it anymore. I think the gnulib getaddrinfo does it. Have you tried the gnutls version of wget? Does it work for you? Thanks, Giuseppe 1) http://mingw-cross-env.nongnu.org/
Re: [Bug-wget] mirroring one sourceforge package?
Micah Cowan mi...@cowan.name writes: So it looks like wget is correctly blocking the http URL, but incorrectly permitting the https URL. We check if the two schemes are similar but at the same time we require the port to be identical. I have relaxed this condition, now the two ports must be identical only in the case the same protocol is used. I have pushed this patch: === modified file 'src/recur.c' --- src/recur.c 2011-01-01 12:19:37 + +++ src/recur.c 2011-03-30 23:36:05 + @@ -563,7 +563,8 @@ if (opt.no_parent schemes_are_similar_p (u-scheme, start_url_parsed-scheme) 0 == strcasecmp (u-host, start_url_parsed-host) - u-port == start_url_parsed-port + (u-scheme != start_url_parsed-scheme + || u-port == start_url_parsed-port) !(opt.page_requisites upos-link_inline_p)) { if (!subdir_p (start_url_parsed-dir, u-dir)) Applying it and launching wget using the same arguments used by Karl, I get: $ find sourceforge.net/ -maxdepth 3 sourceforge.net/ sourceforge.net/projects sourceforge.net/projects/biblatex-biber sourceforge.net/projects/biblatex-biber/files sourceforge.net/robots.txt Just in time before the release :-) Cheers, Giuseppe
Re: [Bug-wget] Re: Maintainer needs updating in man page
Micah Cowan mi...@cowan.name writes: Since the manpage is automatically generated from the info manual, this needs to be fixed in wget.texinfo, too. thanks, I am going to fix it in the documentation too. Giuseppe
Re: [Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2
Steven M. Schweda s...@antinode.info writes: I know that all the serious folks in the world have all the GNU infrastructure in place, but wouldn't a clever repository-access system be able to grind out a ready-to-use distribution kit upon user request? Just a thought. we make a difference between people who use the source tarball and developers who do a checkout from the source repository. The latter need some additional programs. The bootstrap scripts ensure the gnulib files are always updated without care if they were updated in our repository (we don't really want to care about it) and most important it avoids to duplicate the same file over different repositories. I think these advantages worth the additional costs introduced by the bootstrap procedure. Cheers, Giuseppe
[Bug-wget] new alpha tarball wget-1.12-2460.tar.bz2
Hello, I have prepared a new alpha release containing the last changes: ftp://alpha.gnu.org/gnu/wget/wget-1.12-2460.tar.bz2 To verify it, here the detached GPG signature using the key C03363F4: ftp://alpha.gnu.org/gnu/wget/wget-1.12-2460.tar.bz2.sig Hopefully the next release is close now. Please report any problem you may experience using it. Thanks, Giuseppe
Re: [Bug-wget] wget 1.11.4 windows compile
Hello Ethan, can you please try again using the last development version? You can fetch it from the Bazaar repository how explained here: https://savannah.gnu.org/bzr/?group=wget The branch is trunk. Thanks, Giuseppe Ethan Zheng legen...@hotmail.com writes: Absolutely newbie, Could not compile 1.11.4 but 1.10 compiled, compile without SSL But I have GnuWin32/wget precompied working on my system. Being curious why I am not able to compile 1.11.4 my self on XP (also Win7) MSVC pro 2005. Thanks, When try to build 1.11.4 nmake, it complains: fatal error C1083: Cannot open include file: 'windows/config-compiler.h': No such file or directory Manually add /I.. to src/Makfile CFLAGS got me pass that path issue. Then errors in compiling init.c c:\wgetnmake Microsoft (R) Program Maintenance Utility Version 8.00.50727.42Copyright (C) Microsoft Corporation. All rights reserved. cd srcC:\Program Files\Microsoft Visual Studio 8\VC\BIN\nmake.exe Microsoft (R) Program Maintenance Utility Version 8.00.50727.42Copyright (C) Microsoft Corporation. All rights reserved. cl /nologo /MT /O2 /I. /I.. /DWINDOWS /D_CONSOLE /DHAVE_CONFIG_H /c init.cinit.cinit.c(61) : error C2061: syntax error : identifier 'relocate'init.c(61) : error C2059: syntax error : ;'init.c(72) : error C2085: 'enable_tilde_expansion' : not in formal parameter listinit.c(77) : error C2085: 'cmd_boolean' : not in formal parameter listinit.c(78) : error C2085: 'cmd_bytes' : not in formal parameter listinit.c(79) : error C2085: 'cmd_bytes_sum' : not in formal parameter listinit.c(83) : error C2085: 'cmd_directory_vector' : not in formal parameter listinit.c(84) : error C2085: 'cmd_number' : not in formal parameter listinit.c(85) : error C2085: 'cmd_number_inf' : not in formal parameter listinit.c(86) : error C2085: 'cmd_string' : not in formal parameter listinit.c(87) : error C2085: 'cmd_file' : not in formal parameter listinit.c(88) : error C2085: 'cmd_directory' : not in formal parameter listinit.c(89) : error C2085: 'cmd_time' : not in formal parameter listinit.c(90) : error C2085: 'cmd_vector' : not in formal parameter listinit.c(92) : error C2085: 'cmd_spec_dirstruct' : not in formal parameter listinit.c(93) : error C2085: cmd_spec_header' : not in formal parameter listinit.c(94) : error C2085: 'cmd_spec_htmlify' : not in formal parameter listinit.c(95) : error C2085: 'cmd_spec_mirror' : not in formal parameter listinit.c(96) : error C2085: 'cmd_spec_prefer_family' : not in formal parameter listinit.c(97) : error C2085: 'cmd_spec_progress' : not in formal parameter listinit.c(98) : error C2085: 'cmd_spec_recursive' : not in formal parameter listinit.c(99) : error C2085: cmd_spec_restrict_file_names' : not in formal parameter listinit.c(103) : error C2085: 'cmd_spec_timeout' : not in formal parameter listinit.c(104) : error C2085: 'cmd_spec_useragent' : not in formal parameter listinit.c(105) : error C2085: 'cmd_spec_verbose' : not in formal parameter listinit.c(118) : error C2085: 'commands' : not in formal parameter listinit.c(118) : error C2143: syntax error : missing ';' before '='init.c(268) : error C2065: 'commands' : undeclared identifierinit.c(268) : error C2109: subscript requires array or pointer typeinit.c(273) : error C2109: subscript requires array or pointer typeinit.c(273) : error C2198: 'stricmp' : too few arguments for callinit.c(463) : error C2065: 'enable_tilde_expansion' : undeclared identifierinit.c(532) : error C2065: 'syswgetrc' : undeclared identifierinit.c(533) : warning C4022: 'free' : pointer mismatch for actual parameter 1init.c(653) : error C2109: subscript requires array or pointer typeinit.c(654) : error C2109: subscript requires array or pointer typeinit.c(655) : error C2109: subscript requires array or pointer typeinit.c(655) : error C2109: subscript requires array or pointer typeinit.c(655) : warning C4033: setval_internal' must return a valueNMAKE : fatal error U1077: C:\Program Files\Microsoft Visual Studio 8\VC\BIN\cl.EXE' : return code '0x2'Stop.NMAKE : fatal error U1077: 'C:\Program Files\Microsoft Visual Studio 8\VC\BIN\nmake.exe' : returncode '0x2'Stop.
Re: [Bug-wget] some memory leaks in wget-1.12 release source
Hi Zhenbo, thanks to have reported them. I have committed a patch (commit #2460) which should fix these memory leaks. Cheers, Giuseppe Zhenbo Xu zhenbo1...@gmail.com writes: Hi,everybody! I found some memory leaks in wget-1.12 source codes.The following lists the bugs: bug 1: File:ftp-ls.c Location: line 456 Description: In function ftp_parse_winnt_ls, ... while ((line =read_whole_line(fp)) != NULL) { len = clean_line(line) if (len 40)continue; //Leak occured here, line is not released. ... ... } bug 2: File : ftp.c Location: line 304 Description: in function getftp(), getftp(...) { ... ... if (con-proxy) { logname = concat_strings(...); //line 295, allocated a heap region to logname } ... csock = connect_to_host (host, port); if (csock == E_HOST) return HOSTERR; //return without free(logname) ... ... } I'm glad to get your replies if these are the real bugs . Best Wishes! -- from Zhenbo Xu
Re: [Bug-wget] Use stderr instead of stdout for --ask-password
Micah Cowan mi...@cowan.name writes: Changing the prompt to stderr seems like a simple, single step forward towards proper usage. It's not perfect, but it strikes me as a good sight better than using stdout, which really ought to be reserved for program results-type output, IMO. I have applied the original patch, which prompt to stderr instead of stdout. I agree it is not the ideal usage, but the current decision is about use stderr or inhibit the message at all; considering the diagnostic nature of stderr, then the former seems a better choice. Thanks, Giuseppe
Re: [Bug-wget] Use stderr instead of stdout for --ask-password
Hello Gilles, thanks for your patch. I am not sure it is a good idea to use stderr to prompt a message to the user. I would just inhibit the message when -O- is used. Cheers, Giuseppe Gilles Carry gilles.ca...@st.com writes: Hello, Here is a small patch to change the ask-password behaviour. You may find the explanation in patch's changelog. I confess I did not test much this patch. Best regards, Thank-you, Gilles. diff --git a/src/ChangeLog b/src/ChangeLog index f37814d..b9bf2d7 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,13 @@ +2011-02-22 Gilles Carry gilles dot carry at st dot com + + * main.c (prompt_for_password): Use stderr instead of stdout + to prompt password. This allows to use --output-document=- and + --ask-password simultaneously. Without this, redirecting stdout + makes password prompt invisible and mucks up payload such as in + this example: + wget --output-document=- --ask-password -user=foo \ + http://foo.com/tarball.tgz | tar zxf - + 2009-09-22 Micah Cowan mi...@cowan.name * openssl.c (ssl_check_certificate): Avoid reusing the same buffer diff --git a/src/main.c b/src/main.c index dddc4b2..db1638f 100644 --- a/src/main.c +++ b/src/main.c @@ -725,9 +725,9 @@ static char * prompt_for_password (void) { if (opt.user) -printf (_(Password for user %s: ), quote (opt.user)); +fprintf (stderr, _(Password for user %s: ), quote (opt.user)); else -printf (_(Password: )); +fprintf (stderr, _(Password: )); return getpass(); }
Re: [Bug-wget] [PATCH] Move duplicated code in http.c to a function
Thanks for your contribution. I have just applied your patch. Giuseppe Steven Schubiger s...@member.fsf.org writes: Patch attached. === modified file 'src/ChangeLog' --- src/ChangeLog 2010-12-10 22:55:54 + +++ src/ChangeLog 2011-02-22 12:43:23 + @@ -1,3 +1,9 @@ +2011-02-22 Steven Schubiger s...@member.fsf.org + + * http.c (gethttp, http_loop): Move duplicated code which is run + when an existing file is not to be clobbered to a function. + (get_file_flags): New static function. + 2010-12-10 Evgeniy Philippov egphilip...@googlemail.com (tiny change) * main.c (main): Initialize `total_downloaded_bytes'. === modified file 'src/http.c' --- src/http.c2011-01-01 12:19:37 + +++ src/http.c2011-02-18 18:56:57 + @@ -1448,6 +1448,20 @@ hs-error = NULL; } +static void +get_file_flags (const char *filename, int *dt) +{ + logprintf (LOG_VERBOSE, _(\ +File %s already there; not retrieving.\n\n), quote (filename)); + /* If the file is there, we suppose it's retrieved OK. */ + *dt |= RETROKF; + + /* Bogusness alert. */ + /* If its suffix is html or htm or similar, assume text/html. */ + if (has_html_suffix_p (filename)) +*dt |= TEXTHTML; +} + #define BEGINS_WITH(line, string_constant) \ (!strncasecmp (line, string_constant, sizeof (string_constant) - 1)\ (c_isspace (line[sizeof (string_constant) - 1]) \ @@ -2158,16 +2172,7 @@ /* If opt.noclobber is turned on and file already exists, do not retrieve the file. But if the output_document was given, then this test was already done and the file didn't exist. Hence the !opt.output_document */ - logprintf (LOG_VERBOSE, _(\ -File %s already there; not retrieving.\n\n), quote (hs-local_file)); - /* If the file is there, we suppose it's retrieved OK. */ - *dt |= RETROKF; - - /* Bogusness alert. */ - /* If its suffix is html or htm or similar, assume text/html. */ - if (has_html_suffix_p (hs-local_file)) -*dt |= TEXTHTML; - + get_file_flags (hs-local_file, dt); xfree (head); xfree_null (message); return RETRUNNEEDED; @@ -2639,24 +2644,12 @@ got_name = true; } - /* TODO: Ick! This code is now in both gethttp and http_loop, and is - * screaming for some refactoring. */ if (got_name file_exists_p (hstat.local_file) opt.noclobber !opt.output_document) { /* If opt.noclobber is turned on and file already exists, do not retrieve the file. But if the output_document was given, then this test was already done and the file didn't exist. Hence the !opt.output_document */ - logprintf (LOG_VERBOSE, _(\ -File %s already there; not retrieving.\n\n), - quote (hstat.local_file)); - /* If the file is there, we suppose it's retrieved OK. */ - *dt |= RETROKF; - - /* Bogusness alert. */ - /* If its suffix is html or htm or similar, assume text/html. */ - if (has_html_suffix_p (hstat.local_file)) -*dt |= TEXTHTML; - + get_file_flags (hstat.local_file, dt); ret = RETROK; goto exit; }