Re: Relevance of MACHINES
Hi Hrvoje :) * Hrvoje Niksic [EMAIL PROTECTED] dixit: I'm not sure how to manage the MACHINES file in the distribution. As very few people keep testing the old operating systems documented in MACHINES, it's impossible to guarantee that new versions of Wget will compile or work on them. As a developer and as an user, I prefer no documentation rather than an obsolete, and maybe wrong, one. One way to fix this would be to accopmany the OS entries in MACHINES with the version of Wget that they apply to. But the problem is that, as each version is released, you will only see which machines the *previous* versions worked on. But you have that anyway: since you are releasing a new version, you cannot know if it will compile or install in any particular arch, unless you can see the future ;) Maybe the current form of MACHINES is simply not relevant any more? Not for me, certainly. IMHO, you should get rid of MACHINES. If someones wants to install wget, is not a great effort to give a try. In the vast majority of cases, it will build and install smoothly, and if it has a problem, the mailing list address is more useful than a file saying that you should have a i686-pc-gnu-linux instead you risc-acorn or whatever. I find a very sensible idea to eliminate MACHINES. I don't think it has any real use now if you cannot track down versions, and anyway it won't give you information for the current version... Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net http://raul.pleyades.net/
RE: windows devel binary
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] This is a binary compiled and run on windows nt 4, which doesn't support IPV6, so the -4 should probably be a no-op ? Or not work at all. I was thinking (rather late, I see you have changed other IPV6 stuff in the meantime), why cut the -4 switch if no IPV6 is present ? The principle of least surprise would say leave the switch there in order to avoid a unknown switch error. Suppose you have a bunch of machines, some with, some without IPV6 support. You always want to enforce IPV4 usage. With a -4 switch always supported a simple wget -4 would do the trick in any script used on all those machines. Without that you'd need to have some mean to detect the IPV6 support and change the wget switches used accordingly. Same thing for -6 in fact - leave the switch even if no IPV6 is present and supported, die with a meaningful error message (much better than a unknown switch failure). Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
follow_ftp not work
Wget 1.9.1 .wgetrc: reject = *.[Ee][xX][Ee]* follow_ftp = off Command line: wget -np -nv -r -N -nH --referer=http://www.orion.by -P /tmp/www.orion.by -D orion.by http://www.orion.by Output: Last-modified header missing -- time-stamps turned off. 13:15:08 URL:http://www.orion.by/index.php?mode=main [24703] - /tmp/www.orion.by/index.php?mode=main [1] http://www.orion.by/robots.txt: 13:15:09 ERROR 404: Not Found. 20 redirections exceeded. 20 redirections exceeded. 13:15:18 URL: ftp://62.118.248.95/cyberfight/q3/utils/Seismovision222light.exe [882] - /tmp/www.orion.by/cyberfight/q3/utils/.listing [1] ^C Question: 1. How I can see what parameters wget use at run time? You may add some option for print it. 2. Reject rules require more help with examples!
Re: Relevance of MACHINES
DervishD [EMAIL PROTECTED] writes: One way to fix this would be to accopmany the OS entries in MACHINES with the version of Wget that they apply to. But the problem is that, as each version is released, you will only see which machines the *previous* versions worked on. But you have that anyway: since you are releasing a new version, you cannot know if it will compile or install in any particular arch, unless you can see the future ;) I always test the new release on the machines I have access to, currently Linux and Solaris. The release candidate process makes sure that the release compiles on the architectures that people on this list care about (e.g. Windows, but possibly others). You don't need a time machine for that. If noone bitches, I'll remove MACHINES from the distribution.
Re: follow_ftp not work
Sergey Vasilevsky [EMAIL PROTECTED] writes: Wget 1.9.1 .wgetrc: reject = *.[Ee][xX][Ee]* follow_ftp = off Follow ftp is off by default, so you shouldn't need to set it explicitly. What might have happened in your case is that a http URL *redirected* to ftp, which was followed as a redirection, not as part of the recursive download.
RE: feature request: --second-guess-the-dns
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Dan Jacobson [EMAIL PROTECTED] writes: But I want a --second-guess-the-dns=ADDRESS Aside from `--second-guess-the-dns' being an awful name (sorry), what is the usage scenario for this kind of option? I.e. why would anyone want to use it? Just yesterday I did something similar (by changing local /etc/hosts) in order to directly test different web servers behind a farming device. Multiple servers behind a round robin dns or similar stuff could be another possible scenario where this would be useful. Not your daily usage though. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
Re: Relevance of MACHINES
Hi Hrvoje :) * Hrvoje Niksic [EMAIL PROTECTED] dixit: One way to fix this would be to accopmany the OS entries in MACHINES with the version of Wget that they apply to. But the problem is that, as each version is released, you will only see which machines the *previous* versions worked on. But you have that anyway: since you are releasing a new version, you cannot know if it will compile or install in any particular arch, unless you can see the future ;) I always test the new release on the machines I have access to, currently Linux and Solaris. The release candidate process makes sure that the release compiles on the architectures that people on this list care about (e.g. Windows, but possibly others). You don't need a time machine for that. Of course, but that is a small subset of the arches mentioned in MACHINES, and anyway all windows, linux and solaris users assume that Wget will keep on running on those systems. That is, MACHINE will keep on having outdated or useless information. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net http://raul.pleyades.net/
Re: profiling with wget
Robert Parks [EMAIL PROTECTED] writes: On Unix, you can use `-O /dev/null' to avoid writes to disk. (The application is still writing to an output stream, but the data is lost in a black hole.) I'm not sure if there's an equivalent under Windows. The equivalent that I found (at least for NT and 2000) is NUL: command NUL At least it works for redirection. Unlike /dev/null, it doesn't really exist as a file in directory. I don't know if it would work for filename options like '-O' or if that would really create a file called NUL. Adam Stein -- Adam Stein @ Xerox Corporation Email: [EMAIL PROTECTED] Disclaimer: All views expressed here have been proved to be my own. [http://www.csh.rit.edu/~adam/]
Re: feature request: --second-guess-the-dns
Come to think of it, I've had need for this before; the switch makes at least as much sense as `--bind-address', which I've never needed myself. Maybe `--connect-address' would be a good name for the option? It would nicely parallel `--bind-address'. Are there any takers to implement it?
Re: windows devel binary
Herold Heiko [EMAIL PROTECTED] writes: From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] This is a binary compiled and run on windows nt 4, which doesn't support IPV6, so the -4 should probably be a no-op ? Or not work at all. I was thinking (rather late, I see you have changed other IPV6 stuff in the meantime), Not late at all, this stuff is still very much in flux, and Mauro might still come and change it again. (He was to implement -4 and -6, but he was very busy with other things so I went ahead and provided a reference implementation which he's free to ignore.) why cut the -4 switch if no IPV6 is present ? The principle of least surprise would say leave the switch there in order to avoid a unknown switch error. Suppose you have a bunch of machines, some with, some without IPV6 support. You always want to enforce IPV4 usage. With a -4 switch always supported a simple wget -4 would do the trick in any script used on all those machines. Without that you'd need to have some mean to detect the IPV6 support and change the wget switches used accordingly. Same thing for -6 in fact - leave the switch even if no IPV6 is present and supported, die with a meaningful error message (much better than a unknown switch failure). In principle, I agree with this. `-4' should be a no-op on an IPv6-challenged Wget. Note that, in the current code, it's not so easy to just disable a switch or a `.wgetrc' command, but it's doable. If someone wants to work on this, please do.
Re: --inet6-only option
Hrvoje Niksic [EMAIL PROTECTED] writes: * If the machine doesn't support AI_ADDRCONFIG and Wget sets -4 behind your back, then you shouldn't be allowed to specify -6 because it clearly contradicts with the automagically set -4. (But even then you can still use `--no-inet4-only -6', which will make Wget resolve IPv6 addresses only, and fail to connect to any host. After thinking about this some more, I decided to redo the last part. The feature of being able to undo the implicit `-4' is not that useful because it's unavailable on machines with AI_ADDRCONFIG, which we expect to become ubiquitous. The whole thing degrades into an ugly and unnecessary special case. I decided to implement (in part) your suggestion to move the socket check to lookup_host. The code now no longer automatically sets --inet4; instead, it does the following: * If -4 is specified, request AF_INET family from getaddrinfo. * If -6 is specified, request AF_INET6 family from getaddrinfo. * Otherwise, request AF_UNSPEC family with the AI_ADDRCONFIG flag. If AI_ADDRCONFIG is not available, simulate it with an explicitly check whether an AF_INET6 socket can be created. If not, simply request AF_INET instead of AF_UNSPEC. And if yes, keep using AF_UNSPEC. This should cause systems without AI_ADDRCONFIG to behave exactly the same as systems that support it. (The only exception would be hypothetical IPv6-only systems that actually cannot create AF_INET sockets. If such were to exist, I assume that they will support AI_ADDRCONFIG.)
Re: feature request: --second-guess-the-dns
Herold Heiko [EMAIL PROTECTED] writes: From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Maybe `--connect-address' would be a good name for the option? It would nicely parallel `--bind-address'. I was wondering if it should be possibile to pass more than one name to address change (for recursive behaviour with absolute links). This would mean we need to associate names and address, and either a syntax which can specify multiple name/address couples or multiple invocations - e.g. wget --connect-address=site.dot.com:1.2.3.4,site2.dot.com:5.6.7.8,www.internal.si te:10.1.2.3 In my opinion, that's going too far. If you need that kind of stuff, edit /etc/hosts or the equivalent. I really don't think that such level of elaboration is appropriate for a fairly rarely needed feature. Besides, imagine every single network application implementing its own alternative resolver. If I implement --connect-address, it will be in the form of `--connect-address=HOST[:PORT]', meaning that all network connections performed by Wget connect to HOST[:PORT] instead of to the hosts and ports specified by URLs. I believe that handles Dan's case and those encountered by me, and nicely parallels `--bind-address', which will also be extended to allow port.
Re: feature request: --second-guess-the-dns
On Mon, 17 Nov 2003, Hrvoje Niksic wrote: Come to think of it, I've had need for this before; the switch makes at least as much sense as `--bind-address', which I've never needed myself. Maybe `--connect-address' would be a good name for the option? It would nicely parallel `--bind-address'. Are there any takers to implement it? In curl land we offer this functionality in a more implicit way, by allowing the user to override any tool-generated header from the command line. It might not be as easily accessible as this proposed option, but it offers even more power. In the case where you want to connect to 1.2.3.4, asking for the host abc.com you would use 'curl -H Host: abc.com http://1.2.3.4'. This of course also lets you fool around with port numbers like 'curl -H Host: abc.com:8080 http://1.2.3.4:2003' I don't claim this is a better way, I'm only providing food for thought here. -- -=- Daniel Stenberg -=- http://daniel.haxx.se -=- ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: feature request: --second-guess-the-dns
On Sun, 16 Nov 2003, Hrvoje Niksic wrote: You can do this now: wget http://216.46.192.85/ Using DNS is just a convenience after all, not a requirement. Unfortunately, widespread use of name-based virtual hosting made it a requirement in practice. ISP's typically host a bunch of web sites on the same interface, and http://DOTTED-DECIMAL-ADDR will get you a default page, if even that. Hmm, couldn't --header Host: hostname work? I think it could, but now wget appends it instead of replacing its own generated one... -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
if anything bad happens, return non-zero
$ wget --spider BAD_URL GOOD_URL; echo $? 0 $ wget --spider GOOD_URL BAD_URL; echo $? 1 I say they both should be 1. If anything bad happens, return 1 or some other non-zero value. By BAD, I mean a producer of e.g., ERROR 503: Service Unavailable. --spider or not, too. And stop making me have to confirm each and every mail to this list.
convert links problem solved...new problem
Hi, Upgrading wget to 1.8.1 (my server admin won't put a newer version on) solved the problem of relative links being converted into an incorrect mish mash of absolute/relative links. Part of this solution was the upgrade from 1.7 to 1.8.1. The other part was in my perl system call, changed from: system(wget ...); to system(cd targetdir; wget ...); However, a new problem has arised. Files that have spaces in them are being saved as what%20ever.html. I read in the mailing list archives that this is a feature that will probably be removed. In the meantime, is there a solution to turn this feature off? The problem for me is that the files retrieved with wget are to be served by apache and viewed with a web browser. Hence they are referenced with a url from a web page. Not a problem except that web browsers treat %20 as a space. So it is looking for what ever.html. Is it possible to write a url that includes %20 as a literal? Thanks, Doug
Re: feature request: --second-guess-the-dns
Maciej W. Rozycki [EMAIL PROTECTED] writes: Hmm, couldn't --header Host: hostname work? I think it could, but now wget appends it instead of replacing its own generated one... It's not very hard to fix `--header' to replace Wget-generated values. Is there consensus that this is a good replacement for `--connect-address'?
Re: if anything bad happens, return non-zero
Dan Jacobson [EMAIL PROTECTED] writes: $ wget --spider BAD_URL GOOD_URL; echo $? 0 $ wget --spider GOOD_URL BAD_URL; echo $? 1 I say they both should be 1. If anything bad happens, return 1 or some other non-zero value. By BAD, I mean a producer of e.g., ERROR 503: Service Unavailable. --spider or not, too. I agree that this would be better. I'll take a look at how hard it would be to change this. And stop making me have to confirm each and every mail to this list. Currently the only way to avoid confirmations is to subscribe to the list. I'll try to contact the list owners to see if the mechanism can be improved.
Re: convert links problem solved...new problem
[EMAIL PROTECTED] writes: Upgrading wget to 1.8.1 (my server admin won't put a newer version on) ??? Why 1.8.1? 1.8.2 fixed many *bugs* that were present in 1.8.1. This sounds like the sticking-to-Debian-stable brain damage. If I were you, I would ask the admin the permission to compile the latest version in your own home directory, if he is unwilling to upgrade the one on the system. I'm aware that compiling your own programs can be seen as malicious, but it can't hurt to ask for permission. However, a new problem has arised. Files that have spaces in them are being saved as what%20ever.html. I read in the mailing list archives that this is a feature that will probably be removed. In the meantime, is there a solution to turn this feature off? The problem has been fixed in Wget 1.9.x. Before reporting bugs, please upgrade to the latest version, one way or the other. This way you are wasting a lot of people's time on problems that have been solved. :-(
Translations for 1.9.1
Hi, I just installed wget 1.9.1, works fine. But on my machine, translations are broken somehow, all special characters are scrambled. With wget 1.9 this didn't happen. Example from de.po: #: src/convert.c:439 #, c-format msgid Cannot back up %s as %s: %s\n msgstr Anlegen eines Backups von »%s« als »%s« nicht möglich: %s\n (ftp://ftp.gnu.org/pub/gnu/wget/wget-1.9.1.tar.gz, unpacked on linux with tar xvfz) Am I the only one with this problem? Manfred PS: I use a different email address now, as since my last posting the amount of spam reached new dimensions. Probably because this list is mirrored as a newsgroup with cleartext email addresses :-( -- GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen! Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken tolle Preise. http://www.gmx.net/de/cgi/special/ +++ GMX - die erste Adresse für Mail, Message, More! +++
lrand48 and friends obsolete by SVID 3?
I'm considering the use of lrand48/drand48 (where available) to generate random integer and floats. The code Wget uses now is portable, but very primitive, especially for generating floats. But the Linux man page says this about *rand48: NOTES These functions are declared obsolete by SVID 3, which states that rand(3) should be used instead. Is this true? Was the committee on mind-altering substances? How otherwise could they possibly have obsoleted *rand48 in favor of -- rand? Is SVID 3 even relevant nowdays?
Re: Translations for 1.9.1
Manfred Schwarb [EMAIL PROTECTED] said: But on my machine, translations are broken somehow, all special characters are scrambled. With wget 1.9 this didn't happen. Example from de.po: #: src/convert.c:439 #, c-format msgid Cannot back up %s as %s: %s\n msgstr Anlegen eines Backups von %s als %s nicht mglich: %s\n It's normal. de.po is written in UTF-8. Use e.g cat de.po | iconv -f UTF-8 -t CP850 to display correctly, but gettext should handle this fine. --gv
Re: if anything bad happens, return non-zero
$ wget --spider BAD_URL GOOD_URL; echo $? 0 $ wget --spider GOOD_URL BAD_URL; echo $? 1 I say they both should be 1. If anything bad happens, return 1 or some other non-zero value. I'm glad I'm not the only one to complain about this issue. I wasted a lot of time taiming my downloads just because wget has no proper exit status handling (see my posting about BUG in --timeout (exit status)) Manfred -- GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen! Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken tolle Preise. http://www.gmx.net/de/cgi/special/ +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: Translations for 1.9.1
Manfred Schwarb [EMAIL PROTECTED] writes: I just installed wget 1.9.1, works fine. But on my machine, translations are broken somehow, all special characters are scrambled. With wget 1.9 this didn't happen. There was no change in handling translation from 1.9 to 1.9.1, except perhaps for a new `de.po' from the Translation Project. Indeed, the blunder is theirs: they switched to utf-8 for `de.po', but forgot to mark it as utf-8. They fixed this problem later, so future versions of Wget will not have this problem. If you want, you can download the latest German PO file from the Translation Project and correct the problem yourself.
Re: Translations for 1.9.1
It's normal. de.po is written in UTF-8. Use e.g cat de.po | iconv -f UTF-8 -t CP850 to display correctly, but gettext should handle this fine. I see. Thanks a lot for your hint. Probably my gettext is just too old. A remark about this issue in the README or INSTALL file would be great. Manfred -- GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen! Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken tolle Preise. http://www.gmx.net/de/cgi/special/ +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: Translations for 1.9.1
Manfred Schwarb [EMAIL PROTECTED] writes: I just installed wget 1.9.1, works fine. But on my machine, translations are broken somehow, all special characters are scrambled. With wget 1.9 this didn't happen. There was no change in handling translation from 1.9 to 1.9.1, except perhaps for a new `de.po' from the Translation Project. Indeed, the blunder is theirs: they switched to utf-8 for `de.po', but forgot to mark it as utf-8. They fixed this problem later, so future versions of Wget will not have this problem. If you want, you can download the latest German PO file from the Translation Project and correct the problem yourself. I followed the hint of Gisle and did cat de.po | iconv -f UTF-8 -t ISO-8859-1 de.po.new; mv de.po.new de.po this worked great. Thanks Manfred -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: feature request: --second-guess-the-dns
By the way, I did edit /etc/hosts to do one experiment http://groups.google.com/groups?threadm=vrf7007pbg2136%40corp.supernews.com i.e. [EMAIL PROTECTED] to test an IP/name combination, without waiting for DNS's to update. Good thing I was root so I could do it. I sure hope that when one sees Connecting to jidanni.org[216.46.192.85]:80... connected. that there is no interference along the way, that that IP is really where we are going, to wget's best ability. By the way, /etc/hosts affects other users on the system, and other jobs than the current one; and one might be using various caching DNSs, etc. Just one more justification for this wishlist item. --connect-address sounds ok... whatever.
Re: feature request: --second-guess-the-dns
P == Post, Mark K [EMAIL PROTECTED] writes: P You can do this now: P wget http://216.46.192.85/ P Using DNS is just a convenience after all, not a requirement. but then one doesn't get the HTTP Host field set to what he wants.
Re: feature request: --second-guess-the-dns
Dan Jacobson [EMAIL PROTECTED] writes: I sure hope that when one sees Connecting to jidanni.org[216.46.192.85]:80... connected. that there is no interference along the way, that that IP is really where we are going, to wget's best ability. I can guarantee that much -- the entire point of printing the IP address is for knowledgable people to be able to tell where the hell they're *really* connecting. By the way, /etc/hosts affects other users on the system, and other jobs than the current one; and one might be using various caching DNSs, etc. Just one more justification for this wishlist item. --connect-address sounds ok... whatever. Have you seen the rest of the discussion? Would it do for you if Wget correctly handled something like: wget --header='Host: jidanni.org' http://216.46.192.85/ (I'm trying to avoid new command-line options except where absolutely necessary. Wget has a *lot* of them already.)
Re: non-subscribers have to confirm each message to bug-wget
And stop making me have to confirm each and every mail to this list. Hrvoje Currently the only way to avoid confirmations is to subscribe to the Hrvoje list. I'll try to contact the list owners to see if the mechanism can Hrvoje be improved. subscribe me with the nomail option, if it can't be fixed. often I come back from a long vacation, only to find my last reply is waiting for confirmation, that probably expired.
Re: feature request: --second-guess-the-dns
H It's not very hard to fix `--header' to replace Wget-generated H values. H Is there consensus that this is a good replacement for H `--connect-address'? I don't want to tamper with headers. I want to be able to do experiments leaving all variables alone except for IP address. Thus --connect-address is still needed.
Fw: Re[2]: follow_ftp not work
Follow ftp is off by default, so you shouldn't need to set it explicitly. What might have happened in your case is that a http URL redirected* to ftp, which was followed as a redirection, not as part of the recursive download. Hrvoje, it looks like it would take much less time to implement something like --disregard-external-redirects than to explain everyone that the feature is not yet available. If you're up to implementing this, I'd suggest a supplemental --store-external-redirects option, which would create a .htaccess file with external redirects, which would be EXTREMELY useful for mirroring HUGE sites (like www.gnu.org). Luck, Peter. ps: This letter was sent personaly to Hrvoje by error, sorry Hrvoje.