Re: wget and ipv6 (1.6 beta5) serious bugs
Thanks for the report. I agree that the current code does not work for many uses -- that's why IPv6 is still "experimental". Mauro Tortonesi is working on contributing IPv6 support that works better. For the impending release, I think the workaround you posted makes sense. Mauro, what do you think?
Re: POST followed by GET
Hrvoje Niksic wrote: > I like these suggestions. How about the following: for 1.9, document > that `--post-data' expects one URL and that its behavior for multiple > specified URLs might change in a future version. > > Then, for 1.10 we can implement one of the alternative behaviors. That works for me... I can hardly wait for 1.9 to get wrapped up so we can start working on 1.10. Hrvoje, has anyone mentioned how glad we are that you've come back? Tony
Re: bug in 1.8.2 with
You're right -- that code was broken. Thanks for the patch; I've now applied it to CVS with the following ChangeLog entry: 2003-10-15 Philip Stadermann <[EMAIL PROTECTED]> * ftp.c (ftp_retrieve_glob): Correctly loop through the list whose elements might have been deleted.
Re: POST followed by GET
I like these suggestions. How about the following: for 1.9, document that `--post-data' expects one URL and that its behavior for multiple specified URLs might change in a future version. Then, for 1.10 we can implement one of the alternative behaviors.
bug in 1.8.2 with
Hello, which this download you will get a segfault. wget --passive-ftp --limit-rate 32k -r -nc -l 50 \ -X */binary-alpha,*/binary-powerpc,*/source,*/incoming \ -R alpha.deb,powerpc.deb,diff.gz,.dsc,.orig.tar.gz \ ftp://ftp.gwdg.de/pub/x11/kde/stable/3.1.4/Debian Philip Stadermann <[EMAIL PROTECTED]> discover this problem and submitted the attached patch. Its a problem with the linked list. -- Noèl Köthe Debian GNU/Linux, www.debian.org --- ftp.c.orig 2003-10-14 15:37:15.0 +0200 +++ ftp.c 2003-10-14 15:39:28.0 +0200 @@ -1670,22 +1670,21 @@ static uerr_t ftp_retrieve_glob (struct url *u, ccon *con, int action) { - struct fileinfo *orig, *start; + struct fileinfo *start; uerr_t res; struct fileinfo *f; con->cmd |= LEAVE_PENDING; - res = ftp_get_listing (u, con, &orig); + res = ftp_get_listing (u, con, &start); if (res != RETROK) return res; - start = orig; /* First: weed out that do not conform the global rules given in opt.accepts and opt.rejects. */ if (opt.accepts || opt.rejects) { - f = orig; + f = start; while (f) { if (f->type != FT_DIRECTORY && !acceptable (f->name)) @@ -1698,7 +1697,7 @@ } } /* Remove all files with possible harmful names */ - f = orig; + f = start; while (f) { if (has_invalid_name(f->name)) signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
wget and ipv6 (1.6 beta5) serious bugs
Hi, Right now wget code looks like this: #ifdef ENABLE_IPV6 int ip_default_family = AF_INET6; #else int ip_default_family = AF_INET; #endif and then ./connect.c: sock = socket (ip_default_family, SOCK_STREAM, 0); This assumes that binary compiled with ipv6 support is always used on IPv6 capable host which is not true in many, many cases. Such binary on ipv4 only host will cause: [EMAIL PROTECTED] src]$ LC_ALL=C ./wget wp.pl --21:48:37-- http://wp.pl/ => `index.html' Resolving wp.pl... 212.77.100.101 Connecting to wp.pl[212.77.100.101]:80... failed: Address family not supported by protocol. Retrying. --21:48:38-- http://wp.pl/ (try: 2) => `index.html' Connecting to wp.pl[212.77.100.101]:80... failed: Address family not supported by protocol. Retrying. --21:48:40-- http://wp.pl/ (try: 3) => `index.html' Connecting to wp.pl[212.77.100.101]:80... failed: Address family not supported by protocol. Retrying. Applications that use getaddrinfo() shouldn't even bother to know which family they use. Just should do getaddrinfo("host", ..., &res0); for (res = res0; res; res=res->ai_next) { s = socket(res->ai_family, res->ai_socktype, res->ai_protocol) if (s<0) continue if ((connect(s, res->ai_addr, res->ai_addrlen) <0 ) { close(s) continue) } break } This pseudo-code should show the idea. The best thing IMO is to use getaddrinfo for resolving + struct addrinfo (linked list) for storing data about host.x.y.com. For systems without getaddrinfo ipv4 only replacements should be provided - see openssh portable how it's done there. The whole idea of getaddrinfo/getnameinfo is to get family independent functions. They even work for AF_UNIX on some systems (like on linux+glibc). Anyway for now workaround is something like this in main(): #ifdef ENABLE_IPV6 s = socket(AF_INET6, SOCK_STREAM, 0); if (s < 0 && (errno == EAFNOSUPPORT)) ip_default_family = AF_INET; close(s); #endif -- Arkadiusz MiśkiewiczCS at FoE, Wroclaw University of Technology arekm.pld-linux.org AM2-6BONE, 1024/3DB19BBD, arekm(at)ircnet, PLD/Linux
Re: POST followed by GET
On Tue, 14 Oct 2003, Tony Lewis wrote: > It would be the same logically equivalent to the following three commands: > > wget --user-agent='my robot' --post-data 'data=foo' POST URL1 > wget --user-agent='my robot' --post-data 'data=bar' POST URL2 > wget --user-agent='my robot' --referer=URL3 GET URL4 Just as a comparison, this approach is basicly what we've went with in curl (curl has supported this kind of operations for years, including support for multipart formposts which I guess is next up for adding to wget! ;-P). There are just too many options or specifics that you can set, so having them all possible to change between several URLs specified on the command line makes the command line parser complicated and the command lines even more complex. The main thing this described approach requires (that I can think of) is that wget would need to store session cookies as well in the cookie file (I believe I read that it doesn't atm). -- -=- Daniel Stenberg -=- http://daniel.haxx.se -=- ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: POST followed by GET
Hrvoje Niksic wrote: > Maybe the right thing would be for `--post-data' to only apply to the > URL it precedes, as in: > > wget --post-data=foo URL1 --post-data=bar URL2 URL3 > > But I'm not at all sure that it's even possible to do this and keep > using getopt! I'll start by saying that I don't know enough about getopt to comment on whether Hrvoje's suggestion will work. It's hard to imagine a situation where wget's current behavior makes sense over multiple URLs. I'm sure someone can come up with an example, but it's likely to be an unusual case. I see the ability to POST a form as being most useful when a site requires some kind of form-based authentication to proceed with looking at other pages within the site. Some alternatives that occur to me follow. Alternative #1. Only apply --post-data to the first URL on the command line. (A simple solution that probably covers the majority of cases.) Alternative #2. Allow POST and GET as keywords in the URL list so that: wget POST http://www.somesite.com/post.cgi --post-data 'a=1&b=2' GET http://www.somesite.com/getme.html would explicitly specify which URL uses POST and which uses GET. If more than one POST is specified, all use the same --post-data. Alternative #3. Look for tags and have --post-file specify the data to be specified to various forms: --form-action=URL1 'a=1&b=2' --form-action=URL2 'foo=bar' Alternative #4. Allow complex sessions to be defined using a "session" file such as: wget --session=somefile --user-agent='my robot' Options specified on the command line apply to every URL. If somefile contained: --post-data 'data=foo' POST URL1 --post-data 'data=bar' POST URL2 --referer=URL3 GET URL4 It would be the same logically equivalent to the following three commands: wget --user-agent='my robot' --post-data 'data=foo' POST URL1 wget --user-agent='my robot' --post-data 'data=bar' POST URL2 wget --user-agent='my robot' --referer=URL3 GET URL4 with wget's state maintained across the session. Tony
Re: Question about url convert
"Sergey Vasilevsky" <[EMAIL PROTECTED]> writes: > Have wget any rules to convert retrive url to store url? Or may be > in future? > > For example: > Get -> site.com/index.php?PHPSESSID=123124324 > Filter -> /PHPSESSID=[a-z0-9]+//i > Save as -> site.com/index.php The problem with this is that it would require the use of a regexp library, which I'm trying to avoid for Wget. There are many different regexp libraries, many with incompatible syntaxes and interfaces, and a full-blown regexp library is just too large to carry around with a program like Wget. If you can think of a way to get that kind of functionality with something that is not based on regexps, it will have a much better chance of getting in.
Re: Wget 1.8.2 bug
"Sergey Vasilevsky" <[EMAIL PROTECTED]> writes: > I use wget 1.8.2. When I try recursive download site site.com where > site.com/ first page redirect to site.com/xxx.html that have first > link in the page to site.com/ then Wget download only xxx.html and > stop. Other links from xxx.html not followed! I've seen pages that do that kind of redirections, but Wget seems to follow them, for me. Do you have an example I could try?
Re: POST followed by GET
"Tony Lewis" <[EMAIL PROTECTED]> writes: > I'm trying to figure out how to do a POST followed by a GET. > > If I do something like: > > wget http://www.somesite.com/post.cgi --post-data 'a=1&b=2' > http://www.somesite.com/getme.html -d Well... `--post-data' currently affects all the URLs in the Wget run. I'm not sure if that makes sense... perhaps it should only apply to the first one. But I'm not sure that makes sense either -- what if I *want* to POST the same data to two URLs, much like you want to POST to one and GET to the other? Maybe the right thing would be for `--post-data' to only apply to the URL it precedes, as in: wget --post-data=foo URL1 --post-data=bar URL2 URL3 In that case, URL1 would be POSTed with foo, URL2 with bar, and URL3 would be fetched with GET. But I'm not at all sure that it's even possible to do this and keep using getopt! What do the others think?
Question about url convert
Have wget any rules to convert retrive url to store url? Or may be in future? For example: Get -> site.com/index.php?PHPSESSID=123124324 Filter -> /PHPSESSID=[a-z0-9]+//i Save as -> site.com/index.php
Wget 1.8.2 bug
I use wget 1.8.2. When I try recursive download site site.com where site.com/ first page redirect to site.com/xxx.html that have first link in the page to site.com/ then Wget download only xxx.html and stop. Other links from xxx.html not followed!