Re: [Bug-wget] WARC output
Gijs van Tulder gvtul...@gmail.com writes: It would be cool if Wget could become one of these tools. Already the Swiss army knife for mirroring websites, the one thing that Wget is missing is a good way to store these mirrors. The current output of --mirror is not sufficient for archival purposes: Sure we do! With some help from others, I've added WARC functions to Wget. With the --warc-file option you can specify that the mirror should also be written to a WARC archive. Wget will then keep everything, including Can you please track all contributors? Any contribution to GNU wget requires copyright assigments to the FSF. Do you think this is something that could be included in the main Wget version? If that's the case, what should be the next step? Sure, I will take a look at the code in the next days. In the meanwhile, can you check if you are following the GNU Coding Standards for the new code[1]? The implementation makes use of the open source WARC Tools library (Apache License 2.0): http://code.google.com/p/warc-tools/ how much code is really needed from that library? I wonder if we can avoid this dependency at all. Cheers, Giuseppe 1) http://www.gnu.org/prep/standards/
Re: [Bug-wget] WARC output
Giuseppe Scrivano writes: The implementation makes use of the open source WARC Tools library (Apache License 2.0): http://code.google.com/p/warc-tools/ how much code is really needed from that library? I wonder if we can avoid this dependency at all. The library comes with some utilities, an HTTrack plugin, a Java module etc. These extra things are not needed for Wget. But of the C library, I used pretty much everything. The library handles all the WARC writing stuff. It can also read WARCs, but that's not needed here. Rough estimate: 12.000 lines of code (excluding comments). It's probably important to note that I have changed a few small things in the warc-tools library. (I have records in Git.) As for the other dependencies: - I used an MIT-licenced base32 encoder (there seems to be no such module in Gnulib), but that's quite small so could be replaced; - it links to the UUID library. Can you please track all contributors? Any contribution to GNU wget requires copyright assigments to the FSF. Yes, it's all in the Git history, so it's easy to make a list. (There's only one other contributor of code, others helped with testing.) In the meanwhile, can you check if you are following the GNU Coding Standards for the new code? I tried to do that. So except for the warc-tools library, which uses a different standard, all new code follows the GNU standards (I hope). Thanks, Gijs
[Bug-wget] gnutls link failure, ssl
My initial build of wget failed due to gnutls version problems. configure said: .. checking for main in -lgnutls... yes configure: compiling in support for SSL via GnuTLS But then the link failed with: gcc -O2 -Wall -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o retr.o spider.o url.o utils.o exits.o build_info.o iri.o version.o ftp-opie.o gnutls.o ../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz -lidn -lrt gnutls.o: In function `ssl_connect_wget': gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct' gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct' collect2: ld returned 1 exit status Evidently configure should check for gnutls_priority_set_direct also. And if it fails, hopefully it will fall back to openssl. (This was on CentOS 5.6, but presumably that doesn't especially matter.) Related, there used to be an option --with-libssl-prefix. I'm not sure when it was removed, but it was useful. Also, configure --help does not mention the possibility of --with-ssl=openssl. Finally, the NEWS file doesn't say anything about either of these: preferring tls to openssl or the --with-ssl=openssl option. I didn't look to see if there were other configure options that didn't make to the --help and/or NEWS. Thanks, Karl
Re: [Bug-wget] gnutls link failure, ssl
Hello Karl, thanks to have reported it. It looks like a very ugly one, I think it depends from last change: revno: 2517 committer: Giuseppe Scrivano gscriv...@gnu.org branch nick: wget timestamp: Fri 2011-08-05 21:36:08 +0200 message: gnutls: do not use a deprecated function. I'll rollback to the deprecated function when `gnutls_priority_set_direct' is not available. I will amend your comments into the NEWS file and configure --help. I think it is too late now to replace packages, and to avoid synchronization problems with mirrors, I'll go for 1.13.1. I had the feeling that 1.13 wasn't going to be released :-) Thanks, Giuseppe k...@freefriends.org (Karl Berry) writes: My initial build of wget failed due to gnutls version problems. configure said: .. checking for main in -lgnutls... yes configure: compiling in support for SSL via GnuTLS But then the link failed with: gcc -O2 -Wall -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o retr.o spider.o url.o utils.o exits.o build_info.o iri.o version.o ftp-opie.o gnutls.o ../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz -lidn -lrt gnutls.o: In function `ssl_connect_wget': gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct' gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct' collect2: ld returned 1 exit status Evidently configure should check for gnutls_priority_set_direct also. And if it fails, hopefully it will fall back to openssl. (This was on CentOS 5.6, but presumably that doesn't especially matter.) Related, there used to be an option --with-libssl-prefix. I'm not sure when it was removed, but it was useful. Also, configure --help does not mention the possibility of --with-ssl=openssl. Finally, the NEWS file doesn't say anything about either of these: preferring tls to openssl or the --with-ssl=openssl option. I didn't look to see if there were other configure options that didn't make to the --help and/or NEWS. Thanks, Karl
Re: [Bug-wget] Bug in processing url query arguments that have '/'
Ángel González wrote: Maybe not. Consider a url like: http://www.example.net/download.php?file=releases/wget.exe In that case using as filename wget.exe makes more sense than download.php@file=releases%2Fwget.exe Whereas there are other cases where the basename is preferible. Probably all examples of a url with a slash in the query string are a bit contrived. I don't think we should cater wget's behavior to contrived examples that seem to make sense. If wget is following its own internal rules, the file should be saved as download.php@file=releases%2Fwget.exe. There is an option to allow the server to designate a more reasonable name through content disposition. Tony