Re: [Bug-wget] WARC output

2011-08-10 Thread Giuseppe Scrivano
Gijs van Tulder gvtul...@gmail.com writes:

 It would be cool if Wget could become one of these tools. Already the
 Swiss army knife for mirroring websites, the one thing that Wget is
 missing is a good way to store these mirrors. The current output of
 --mirror is not sufficient for archival purposes:

Sure we do!



 With some help from others, I've added WARC functions to Wget. With
 the --warc-file option you can specify that the mirror should also be
 written to a WARC archive. Wget will then keep everything, including

Can you please track all contributors?  Any contribution to GNU wget
requires copyright assigments to the FSF.



 Do you think this is something that could be included in the main Wget
 version? If that's the case, what should be the next step?

Sure, I will take a look at the code in the next days.  In the
meanwhile, can you check if you are following the GNU Coding Standards
for the new code[1]?



 The implementation makes use of the open source WARC Tools library
 (Apache License 2.0):
  http://code.google.com/p/warc-tools/

how much code is really needed from that library?  I wonder if we can
avoid this dependency at all.

Cheers,
Giuseppe



1) http://www.gnu.org/prep/standards/



Re: [Bug-wget] WARC output

2011-08-10 Thread Gijs van Tulder

Giuseppe Scrivano writes:

 The implementation makes use of the open source WARC Tools library
 (Apache License 2.0):
   http://code.google.com/p/warc-tools/

 how much code is really needed from that library?  I wonder if we can
 avoid this dependency at all.

The library comes with some utilities, an HTTrack plugin, a Java module 
etc. These extra things are not needed for Wget. But of the C library, I 
used pretty much everything. The library handles all the WARC writing 
stuff. It can also read WARCs, but that's not needed here.


Rough estimate: 12.000 lines of code (excluding comments).

It's probably important to note that I have changed a few small things 
in the warc-tools library. (I have records in Git.)



As for the other dependencies:
- I used an MIT-licenced base32 encoder (there seems to be no such
  module in Gnulib), but that's quite small so could be replaced;
- it links to the UUID library.


 Can you please track all contributors?  Any contribution to GNU wget
 requires copyright assigments to the FSF.

Yes, it's all in the Git history, so it's easy to make a list. (There's 
only one other contributor of code, others helped with testing.)


 In the meanwhile, can you check if you are following the GNU Coding
 Standards for the new code?

I tried to do that. So except for the warc-tools library, which uses a 
different standard, all new code follows the GNU standards (I hope).


Thanks,

Gijs



[Bug-wget] gnutls link failure, ssl

2011-08-10 Thread Karl Berry
My initial build of wget failed due to gnutls version problems.
configure said:
..
checking for main in -lgnutls... yes
configure: compiling in support for SSL via GnuTLS

But then the link failed with:
gcc  -O2 -Wall   -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o 
css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o http.o 
init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o retr.o spider.o 
url.o utils.o exits.o build_info.o iri.o version.o ftp-opie.o gnutls.o 
../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz  -lidn -lrt
gnutls.o: In function `ssl_connect_wget':
gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct'
gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct'
collect2: ld returned 1 exit status

Evidently configure should check for gnutls_priority_set_direct also.
And if it fails, hopefully it will fall back to openssl.
(This was on CentOS 5.6, but presumably that doesn't especially matter.)

Related, there used to be an option --with-libssl-prefix.  I'm not sure
when it was removed, but it was useful.

Also, configure --help does not mention the possibility of
--with-ssl=openssl.

Finally, the NEWS file doesn't say anything about either of these:
preferring tls to openssl or the --with-ssl=openssl option.  I didn't
look to see if there were other configure options that didn't make to
the --help and/or NEWS.

Thanks,
Karl




Re: [Bug-wget] gnutls link failure, ssl

2011-08-10 Thread Giuseppe Scrivano
Hello Karl,

thanks to have reported it.  It looks like a very ugly one, I think it
depends from last change:

revno: 2517
committer: Giuseppe Scrivano gscriv...@gnu.org
branch nick: wget
timestamp: Fri 2011-08-05 21:36:08 +0200
message:
  gnutls: do not use a deprecated function.

I'll rollback to the deprecated function when
`gnutls_priority_set_direct' is not available.

I will amend your comments into the NEWS file and configure --help.

I think it is too late now to replace packages, and to avoid
synchronization problems with mirrors, I'll go for 1.13.1.  I had the
feeling that 1.13 wasn't going to be released :-)

Thanks,
Giuseppe



k...@freefriends.org (Karl Berry) writes:

 My initial build of wget failed due to gnutls version problems.
 configure said:
 ..
 checking for main in -lgnutls... yes
 configure: compiling in support for SSL via GnuTLS

 But then the link failed with:
 gcc -O2 -Wall -o wget cmpt.o connect.o convert.o cookies.o ftp.o css.o
 css-url.o ftp-basic.o ftp-ls.o hash.o host.o html-parse.o html-url.o
 http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o res.o
 retr.o spider.o url.o utils.o exits.o build_info.o iri.o version.o
 ftp-opie.o gnutls.o ../lib/libgnu.a -lgnutls -lgcrypt -lgpg-error -lz
 -lidn -lrt
 gnutls.o: In function `ssl_connect_wget':
 gnutls.c:(.text+0x4b0): undefined reference to `gnutls_priority_set_direct'
 gnutls.c:(.text+0x528): undefined reference to `gnutls_priority_set_direct'
 collect2: ld returned 1 exit status

 Evidently configure should check for gnutls_priority_set_direct also.
 And if it fails, hopefully it will fall back to openssl.
 (This was on CentOS 5.6, but presumably that doesn't especially matter.)

 Related, there used to be an option --with-libssl-prefix.  I'm not sure
 when it was removed, but it was useful.

 Also, configure --help does not mention the possibility of
 --with-ssl=openssl.

 Finally, the NEWS file doesn't say anything about either of these:
 preferring tls to openssl or the --with-ssl=openssl option.  I didn't
 look to see if there were other configure options that didn't make to
 the --help and/or NEWS.

 Thanks,
 Karl



Re: [Bug-wget] Bug in processing url query arguments that have '/'

2011-08-10 Thread Tony Lewis
Ángel González wrote:

 Maybe not. Consider a url like:
 http://www.example.net/download.php?file=releases/wget.exe

 In that case using as filename wget.exe makes more sense than
 download.php@file=releases%2Fwget.exe
 Whereas there are other cases where the basename is preferible.
 Probably all examples of a url with a slash in the query string are a
 bit contrived.

I don't think we should cater wget's behavior to contrived examples that
seem to make sense. If wget is following its own internal rules, the file
should be saved as download.php@file=releases%2Fwget.exe.

There is an option to allow the server to designate a more reasonable name
through content disposition.

Tony