Re: Only follow paths with /res/ in them
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian wrote: I would like to follow all the urls on a site that contain /res/ in the path. I've tried using -I and -A, with values such as res, *res*, */res/*, etc.. Here is an example that downloads pretty much the entire site, rather than what I appear (to me) to have specified: wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i- The urls I would like to follow and output to the command line are of the form: http://img.site.org/b/res/97867797.html - -A isn't useful here: it's applied only against the filename portion of the URL. - -I is what you want; the trouble is that the * wildcard doesn't match slashes (there's plans to introduce a ** wildcard, probably in 1.13). So unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkk7awACgkQ7M8hyUobTrG2wgCeMUN3EnnY2VsmNzQTWOleZKqg ZQYAn1CYoQ7JVc4OYfwLzcPVkai93UQc =3I6Z -END PGP SIGNATURE-
Re: Only follow paths with /res/ in them
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Oh! Please don't use this list (wget@sunsite.dk) any more; I'm trying to get the dotsrc folks to make it go away/forward to bug-wget (I need to ping 'em on this again). The official list for Wget is now [EMAIL PROTECTED] Micah Cowan wrote: Brian wrote: I would like to follow all the urls on a site that contain /res/ in the path. I've tried using -I and -A, with values such as res, *res*, */res/*, etc.. Here is an example that downloads pretty much the entire site, rather than what I appear (to me) to have specified: wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i- The urls I would like to follow and output to the command line are of the form: http://img.site.org/b/res/97867797.html -A isn't useful here: it's applied only against the filename portion of the URL. -I is what you want; the trouble is that the * wildcard doesn't match slashes (there's plans to introduce a ** wildcard, probably in 1.13). So unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkkk7j0ACgkQ7M8hyUobTrH+CACbBzcO4vM6qHIumBeDS2ZyAdfq ONYAnjX7SHAOvEJylkbjjq7IsDXEv+27 =3Hrq -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah, Many thanks with all your very timely help. I have had no issues since following you instructions to upgrade to 1.11.4 and installing it in the /opt directory. I used: $ ./configure --prefix=/opt/wget And point to ist specifically: /opt/wget/bin/wget --tries=10 -r -N -l inf --wait=1\ -nH --cut-dirs=2 ftp://oceans.gsfc.nasa.gov/MODISA/ATTEPH/ \ -o /home1/software/modis/atteph/mirror_a.log \ --directory-prefix=/home1/software/modis/atteph Thanks again. Brock On Monday 27 October 2008 3:06 pm, Micah Cowan wrote: Brock Murch wrote: Sorry, 1 quick question? Do you know of anyone providing rpm's of 1.11.4 for CentOS? Not offhand. It may not yet be available; it was only packaged for Fedora Core a couple months ago, I think. RPMfind.net just lists 1.11.4 sources for fc9 and fc10. If not, would you recommend uninstalling the current one? Before installing from your src? Many thanks. I'd advise against that: I believe various important components of Red Hat/CentOS rely on wget to fetch things. Sometimes minor changes in the output/interface of wget cause problems for automated scripts that form an integral part of an operating system. Though really, I think most of the changes that would pose such a danger are actually already in the Red Hat modified 1.10.2 sources (taken from the development sources for what was later released as 1.11). What I tend to do on my systems, is to configure the sources like: $ ./configure --prefix=$HOME/opt/wget and then either add $HOME/opt to my $PATH, or invoke it directly as $HOME/opt/wget/bin/wget. Note that if you want to build wget with support for HTTPS, you'll need to have the development package for openssl installed. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFJDwveMAkzD2qY/pURAmvuAJ9XG784Djq0mwcTu/nN56tPSM+AMQCgm2KX dzPQ263FF7Gaw4qtE1X0wTI= =CC9T -END PGP SIGNATURE-
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
On Sat, 1 Nov 2008, Micah Cowan wrote: I am puzzled. You mean you declare wget@sunsite.dk retired and [EMAIL PROTECTED] is to be used from now on for the purpose the former list instead? And [EMAIL PROTECTED] will most likely be retired as well soon with the replacement to be [EMAIL PROTECTED] as well? Yup, that's what I mean. Thanks a lot -- good to know my brain has not completely rotted yet. Maciej
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
On Fri, 31 Oct 2008, Micah Cowan wrote: I will ask the dotsrc.org folks to set up this mailing list as a forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At that time, no further mails will be sent to subscribers of this list. Please subscribe to [EMAIL PROTECTED] instead. At this time, I'm thinking of merging wget@sunsite.dk and [EMAIL PROTECTED]; there isn't really enough traffic to justify separate lists, IMO; and often discussions come up on submitted patches that are of interest to everyone. I am puzzled. You mean you declare wget@sunsite.dk retired and [EMAIL PROTECTED] is to be used from now on for the purpose the former list instead? And [EMAIL PROTECTED] will most likely be retired as well soon with the replacement to be [EMAIL PROTECTED] as well? Maciej
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maciej W. Rozycki wrote: On Fri, 31 Oct 2008, Micah Cowan wrote: I will ask the dotsrc.org folks to set up this mailing list as a forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At that time, no further mails will be sent to subscribers of this list. Please subscribe to [EMAIL PROTECTED] instead. At this time, I'm thinking of merging wget@sunsite.dk and [EMAIL PROTECTED]; there isn't really enough traffic to justify separate lists, IMO; and often discussions come up on submitted patches that are of interest to everyone. I am puzzled. You mean you declare wget@sunsite.dk retired and [EMAIL PROTECTED] is to be used from now on for the purpose the former list instead? And [EMAIL PROTECTED] will most likely be retired as well soon with the replacement to be [EMAIL PROTECTED] as well? Yup, that's what I mean. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJDIA77M8hyUobTrERAkr4AJwK7uoprV2Am1j9dAzNkLgQLZz8FwCdEM2q 2AMuQCNzrZzsVaz1UxvBCuk= =WiLZ -END PGP SIGNATURE-
MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] is now back in business as a full-fledged mailing list, and not just a forwarding alias to here. Please subscribe using the interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your earliest convenience. I had hoped to leave forwarding still enabled during the transition; I subscribed wget@sunsite.dk but that did not seem to do the trick. So mails at [EMAIL PROTECTED] will not show up here at the present time. I will ask the dotsrc.org folks to set up this mailing list as a forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At that time, no further mails will be sent to subscribers of this list. Please subscribe to [EMAIL PROTECTED] instead. At this time, I'm thinking of merging wget@sunsite.dk and [EMAIL PROTECTED]; there isn't really enough traffic to justify separate lists, IMO; and often discussions come up on submitted patches that are of interest to everyone. Please avoid continued use of this list if possible. The gmane and mail-archive.com sites will be asked to use the new list for archiving purposes (and of course, bug-wget will also be archived via GNU's pipermail setup). Some of the reasons for this migration may be found at http://article.gmane.org/gmane.comp.web.wget.general/8200/ In addition, people have recently been having difficulties with spam blocking preventing their unsubscription(!), subscription, or even contacting dotsrc.org staff about resolving subscription problems. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJC9/37M8hyUobTrERAuaMAJ9ByOhOnpQr81q6BJO/ytA4wUQkdgCfcPq0 3q88DFI/PL3LtcIx6ky9Vd8= =czx7 -END PGP SIGNATURE-
Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: [EMAIL PROTECTED] is now back in business as a full-fledged mailing list, and not just a forwarding alias to here. Please subscribe using the interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your earliest convenience. Email interface: send an email to [EMAIL PROTECTED] - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJC+vL7M8hyUobTrERAmEsAJ49xkwHMv75li+ihHV38NIP44ho4QCfaAue hUPMKQbmpdrYqPO8M8CSrzE= =CwYx -END PGP SIGNATURE-
Re: -m alias
Michelle Konzack wrote: Am 2008-10-14 01:20:16, schrieb Hraban Luyat: Hi, Considering the -m switch (--mirror): the man page says it is currently equivalent to -r -N -l inf --no-remove-listing. I was wondering, though: why does this not also include -k? When mirroring a website it seems useful to convert the links for appropriate viewing in a browser. That When mirroring a Website, I WANT A IDENTICAL MIRROR. But IF I want to have a mirror for Off-Line reading I can choose the additional -k otion. So your interpretation of the word mirror means byte-by-byte copy (also called a backup or an archive). Another common interpretation, however, is an alternative location, suitable for off-site (which I assume you mean, here, too, instead of off-line) viewing, as in If that website is unavailable, try one of the following mirrors: is, if mirroring here means what it usually means: provide an alternative location to view the same content.. if it's more like a backup, then of course -k is not a good option. But in that case, maybe it's worth mentioning...? No! ;-) My point was that the meaning of mirror is very ambiguous, /especially/ in the context of fetching a live website in this fashion (as one could expect a backup to occur on the server-side instead). I am not arguing that the -k switch should be added as much as that I'm just saying it might very well be worth mentioning. PS: I would like to be CC'ed (not subscribed). ??? -- How can you post without being subscribed? My posts went all definitively rejected when I tried to post to this list. http://wget.addictivecode.org/MailingLists Greetings, Hraban Luyat
Re: -m alias
Am 2008-10-14 01:20:16, schrieb Hraban Luyat: Hi, Considering the -m switch (--mirror): the man page says it is currently equivalent to -r -N -l inf --no-remove-listing. I was wondering, though: why does this not also include -k? When mirroring a website it seems useful to convert the links for appropriate viewing in a browser. That When mirroring a Website, I WANT A IDENTICAL MIRROR. But IF I want to have a mirror for Off-Line reading I can choose the additional -k otion. is, if mirroring here means what it usually means: provide an alternative location to view the same content.. if it's more like a backup, then of course -k is not a good option. But in that case, maybe it's worth mentioning...? No! ;-) PS: I would like to be CC'ed (not subscribed). ??? -- How can you post without being subscribed? My posts went all definitively rejected when I tried to post to this list. Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Special Website / Software One On One Personalized Consultancy
N.C. ;-D -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: -m alias
Michelle Konzack wrote: ??? -- How can you post without being subscribed? My posts went all definitively rejected when I tried to post to this list. Strange. People are definitely posting to the list without having to be subscribed. However, folks have been known to be rejected as spam, even for unsubscription requests. :\ I've been considering a move to gnu servers; but I'm not sure their spam filters are better (though at least they wouldn't reject unsubscriptions I think). But mostly, I'm not motivated enough to get off my lazy butt yet. If we start having more serious problems, perhaps the motivation will increase sufficiently... -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. Thanks very much for providing these, Maksim; they were very helpful. (Sorry for getting back to you so late: it's been busy lately). I've confirmed this behavioral difference (though I compared the current development sources against 1.8.2, rather than 1.10.2 to 1.9.1). Your logs involve a 302 redirection before arriving at the real file, but that's just a red herring. The difference is that when 1.9.1 encountered a server that would respond to a byte-range request with 200 (meaning it doesn't know how to send partial contents), but with a Content-Length value matching the size of the local file, then wget would close the connection and not proceed to redownload. 1.10.2, on the other hand, would just re-download it. Actually, I'll have to confirm this, but I think that current Wget will re-download it, but not overwrite the current content, until it arrives at some content corresponding to bytes beyond the current content. I need to investigate further to see if this change was somehow intentional (though I can't imagine what the reasoning would be); if I don't find a good reason not to, I'll revert this behavior. Probably for the 1.12 release, but I might possibly punt it to 1.13 on the grounds that it's not a recent regression (however, it should really be a quick fix, so most likely it'll be in for 1.12). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBfOj7M8hyUobTrERAjNTAJ9ayaKLvN4bYS/7o0kYcQywDvfwNgCfcGzz P9aAwVD6Q/xQuACjU7KF1ng= =m5QO -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brock Murch wrote: I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use --cut-dirs.To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). I wouldn't recommend that. Using the same output directory for two different source directories seems likely to lead to problems. You'd most likely be better off by pulling to two locations, and then combining them afterwards. I don't know for sure that it _will_ cause problems (except if they happen to have same-named files), as long as .listing files are being properly removed (there were some recently-fixed bugs related to that, I think? ...just appending new listings on top of existing files). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. This snippet seems to be the source of the problem: Error in server response, closing control connection. Retrying. - --14:53:53-- ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/ (try: 2) = `/home1/software/modis/atteph/2002/110/.listing' Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD not required. == PASV ... done.== LIST ... done. That CWD not required bit is erroneous. I'm 90% sure we fixed this issue recently (though I'm not 100% sure that it went to release: I believe so). I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? I'll also try to look into this as I have time (but it might be awhile before I can give it some serious attention; it'd be very helpful if you could do a little more legwork). - -- Thanks very much, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj i8XW58MvjvbS3oy4OsOmbpc= =4kpD -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't. - -- Regards, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc kWs00JOULkzJmzozK7lmcfA= =iSL3 -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
Micah, Thanks for your quick attention to this. Yous, I probably forgot to include the version # [EMAIL PROTECTED] atteph]# wget --version GNU Wget 1.10.2 (Red Hat modified) Copyright (C) 2005 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. I will see if I can get the newest version for: [EMAIL PROTECTED] atteph]# cat /etc/redhat-release CentOS release 4.2 (Final) I'll let you know how that goes. Brock On Monday 27 October 2008 2:19 pm, Micah Cowan wrote: Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't.
More on query matching [Re: Need Design Documents]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 kalpana ravi wrote: Hi Everybody, Hi kalpana, You sent this message to me and [EMAIL PROTECTED]; you wanted [EMAIL PROTECTED] My name is kalpana Ravi.I am planning to contribute to add one of the features listed in https://savannah.gnu.org/bugs/?22089. For that i need to know the design diagrams to understand better. Does anybody know where the UML diagrams are there? We don't have UML diagrams for wget: you'll just have to read the sources (which, unfortunately, are messy). I have some rough-draft diagrams of how I _want_ wget to look eventually, but I'm not done with those, and anyway they wouldn't help you with wget now. Even if you had the UML diagrams for the current state, you'd still need to understand the sources; I really don't think they'd help you much. More important than understanding the design, is understanding what needs to be done; we're still getting a grip on that. My current thought is that there should be a --query-reject (and probably --query-accept, though the former seems far more useful) that should be matched against key/value pairs; thus, --query-reject 'foo=baraction=edit' would reject anything that has foo=bar and action=edit as the key/value pairs in the query string, even if they're not actually next to each other; an example rejected URL might be http://example.com/index.php?a=baction=edittoken=blahfoo=barhergle. Not all query strings are in the key=value format, so --query-reject 'abc1254' would be allowed, and match against the entire query string. For an idea how URL filename matching is currently done, you might check out acceptable src/util.c and the functions it calls, to get an idea of how query matching might be implemented. However, I'll probably tackle this bug myself pretty soon if no one else has managed it yet, as I'm very interested in getting Wget 1.12 finished before long into the new year (ideally, _before_ the new year, but that probably ain't gonna happen). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgt77M8hyUobTrERAnqrAJ921WjEax0kMFf5Ls70Lvvq6LBItgCeL6wj UWA/2b+kVMw8L8IsVjIAGhI= =WKJk -END PGP SIGNATURE-
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. This is currently being tracked at https://savannah.gnu.org/bugs/?24662 A similar and related bug report is at https://savannah.gnu.org/bugs/?24642 in which the logs show that rapidshare.com issues also issues erroneous Content-Range information when it responds with a 206 Partial Content, which exercised a different regression* introduced in 1.11.x. * It's not really a regression, since it's desirable behavior: we now determine the size of the content from the content-range header, since content-length is often missing or erroneous for partial content. However, in this instance of server error, it resulted in less-desirable behavior than the previous version of Wget. Anyway... - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBhvA7M8hyUobTrERAty1AKCEscXut6FDXvXlxpuSBtKkii1/awCeJH0M +JcJ5xG67K7CxHBEcV1x/zY= =D2uE -END PGP SIGNATURE-
RE: wget re-download fully downloaded files
Micah Cowan wrote: Actually, I'll have to confirm this, but I think that current Wget will re-download it, but not overwrite the current content, until it arrives at some content corresponding to bytes beyond the current content. I need to investigate further to see if this change was somehow intentional (though I can't imagine what the reasoning would be); if I don't find a good reason not to, I'll revert this behavior. One reason to keep the current behavior is to retain all of the existing content in the event of another partial download that is shorter than the previous one. However, I think that only makes sense if wget is comparing the new content with what is already on disk. Tony
[bug] wrong speed calculation in (--output-file) logfile
Hello. During download with wget I've redirected output into file with the following command: $ LC_ALL=C wget -o output 'ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz' I've set LC_ALL and LANG explicitly to be sure that this is not locale related problem. The output I saw in output file was: --2008-10-25 14:51:17-- ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz = `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' Resolving mirror.yandex.ru... 77.88.19.68 Connecting to mirror.yandex.ru|77.88.19.68|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /gentoo-distfiles/distfiles ... done. == SIZE OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... 13633213 == PASV ... done.== RETR OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... done. Length: 13633213 (13M) 0K .. .. .. .. .. 0% 131K 1m41s 50K .. .. .. .. .. 0% 132K 1m40s 100K .. .. .. .. .. 1% 135K 99s 150K .. .. .. .. .. 1% 132K 99s 200K .. .. .. .. .. 1% 130K 99s 250K .. .. .. .. .. 2% 45.9K 2m9s 300K .. .. .. .. .. 2% 64.3M 1m50s [snip] 13250K .. .. .. .. .. 99% 131K 0s 13300K .. ...100% 134K=1m41s 2008-10-25 14:52:58 (132 KB/s) - `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' saved [13633213] Note the line above snip: 300K .. 2% 64.3M 1m50s This is impossible to download so much Mbytes as file is much less. I don't know why sometimes this number jumps, but in some cases it cause the following output at the end of download: 13300K .. ... 100% 26101G=1m45s Obviously I don't have possibility to download with such high (26101G=1m45s) speed. This is reproducible with wget 1.11.4. -- Peter.
Re: re-mirror + no-clobber
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jonathan Elsas wrote: ... I've issued the command wget -nc -r -l inf -H -D www.example.com,www2.example.com http://www.example.com but, I get the message: file 'www.example.com/index.html' already there; not retrieving. and the process exits. According to the man page files with .html suffix will be loaded off disk and parsed but this does not appear to be happening. Am I missing something? Yes. It has to download the files before they can be loaded from the disk and parsed. When it encounters a file at a given location, it doesn't have any way to know that that file corresponds to the one it's trying to download. Timestamping with -N may be more what you want, rather than -nc? I'm open to suggestions on clarifying the documentation. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJA7Ds7M8hyUobTrERAsONAJ0dqYh0av7rQ80F8JIcvxhZ1ee7fwCdFG+y AJJxMPVzHpmqAy7iGVRWmCU= =wwns -END PGP SIGNATURE-
re-mirror + no-clobber
Hi -- I'm using wget 1.10.2 I'm trying to mirror a web site with the following command: wget -m http://www.example.com After this process finished, I realized that I also needed pages from a subdomain (eg. www2) To re-start the mirror process without downloading the same pages again, I've issued the command wget -nc -r -l inf -H -D www.example.com,www2.example.com http://www.example.com but, I get the message: file 'www.example.com/index.html' already there; not retrieving. and the process exits. According to the man page files with .html suffix will be loaded off disk and parsed but this does not appear to be happening. Am I missing something? thanks in advance for your help
--mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use - --cut-dirs. To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. Brock Murch Here is an example of the bad type of dir structure I end up with (there should be no EO1 and below): [EMAIL PROTECTED] atteph]# find . -type d -name * | grep EO1 ./2002/110/EO1 ./2002/110/EO1/CZCS ./2002/110/EO1/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS Or: [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/ COMMON [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/ And [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README ls: /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README: No such file or directory [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README All the README files are all the same, and the same as the one is the ftp server
RE: [PATCH] Enable wget to download from given offset and just a given amount of bytes
Juan Manuel wrote: OK, you are right, I`ll try to make it better on my free time. I supposed that it would have been more polite with one option, but thought it was easier with two (and since this is my first approach to C I took the easy way) because one option would have to deal with two parameters. It's clearly easier to deal with options that wget is already programmed to support. For a primer on wget options, take a look at this page on the wiki: http://wget.addictivecode.org/OptionsHowto I suspect you will need to add support for a new action (perhaps cmd_range). Tony
RE: A/R matching against query strings
Micah Cowan wrote: Would hash really be useful, ever? Probably not as long as we strip off the hash before we do the comparison. Tony
accept/reject rules based on querysting
Any ideas about when this option (or an acceptable workaround) will be implemented ? I need to include/exclude based on querysting (with regular expression of course). File name is not enough. Thanks. __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
accept/reject rules based on querysting
Any ideas about when this option (or an acceptable workaround) will be implemented ? I need to include/exclude based on querysting (with regular expression of course). File name is not enough. Thanks. __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: accept/reject rules based on querysting
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gustavo Ayala wrote: Any ideas about when this option (or an acceptable workaround) will be implemented ? I need to include/exclude based on querysting (with regular expression of course). File name is not enough. I consider it an important feature, and currently expect to implement it for 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/faT7M8hyUobTrERApXLAJsFFMsVcibgLlptVhJoMwZeLYg02wCfTLSs ayyryt3wCnkwtAStESYp7cs= =dB6e -END PGP SIGNATURE-
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I sent the following last month but didn't get any feedback. I'm trying one more time. :) - -M Micah Cowan wrote: On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. It seems to me that one usable approach would be to require the ? query string to be an explicit part of rule, if it's expected to be matched against query strings. So -A .htm,.gif,*Action=edit* would all result in matches against the filename portion only, but -A '\?*Action=edit*' would look for Action=edit within the query-string portion. (The '\?' is necessary because otherwise '?' is a wildcard character; [?] would also work.) The disadvantage of that technique is that it's harder to specify that a given string should be checked _anywhere_, regardless of whether it falls in the filename or query-string portion; but I can't think offhand of any realistic cases where that's actually useful. We could also supply a --match-queries option to turn on matching of wildcard rules for anywhere (non-wildcard suffix rules should still match only at the end of the filename portion). Another option is to use a separate -A-like option that does what -A does for filenames, but matches against query strings. I like this idea somewhat less. Thoughts? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/fhT7M8hyUobTrERAgvtAJ0daQEub5GS4EFc7BuGT0pG1E1n0wCgjbnx zb1QK0suZx0woMauqfL0qZI= =5mdh -END PGP SIGNATURE-
RE: A/R matching against query strings
Micah Cowan wrote: On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. What about something like --match-type=TYPE (with accepted values of all, hash, path, search)? For the URL http://www.domain.com/path/to/name.html?a=true#content all would match against the entire string hash would match against content path would match against path/to/name.html search would match against a=true For backward compatibility the default should be --match-type=path. I thought about having host as an option, but that duplicates another option. Tony
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Micah Cowan wrote: On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. What about something like --match-type=TYPE (with accepted values of all, hash, path, search)? For the URL http://www.domain.com/path/to/name.html?a=true#content all would match against the entire string hash would match against content path would match against path/to/name.html search would match against a=true For backward compatibility the default should be --match-type=path. I thought about having host as an option, but that duplicates another option. As does path (up to the final /). Would hash really be useful, ever? It's never part of the request to the server, so it's really more context to the URL than a real part of the URL, as far as requests go. Perhaps that sort of thing could best wait for when we allow custom URL-parsers/filters. Also, I don't like the name search overly much, as that's a very limited description of the much more general use of query strings. But differentiating between three or more different match types tilts me much more strongly toward some sort of shorthand, like the explicit need for \?; with three types, perhaps we'd just use some special prefix for patterns to indicate which sort of match we want (:q: query strings, :a: for all, or whatever), to save on prefix each different type of match with --match-type (or just using all for everything). OTOH, regex support is easy enough to add to Wget, now that we're using gnulib; we could just leave wildcards the way they are, and introduce regexes that match everything. Then query strings are '\?.*foo=bar' (or, for the really pedantic, '\?([^?]*)?foo=bar([^?]*)?$') That last one, though, highlights how cumbersome it is to do proper matching against typical HTML form-generated query strings (it's not really even possible with wildcards). Perhaps a more appropriate pattern-matcher specifically for query strings would be a good idea. It's probably enough to do something like --query-='action=Edit', where there's an implied '\?([^?]*)?' before, and '([^?]*)?$' after. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa vQ1lZnncrdHd8SSbXevK02Y= =YC2A -END PGP SIGNATURE-
Can't wget anidb.net
Hi I have tried to wget http://anidb.net/perl-bin/animedb.pl?show=main but all I seem to get is a file with unreadable characters (and not the HTML file I'm after). Is it because of some perl-script on the site? Thanks! ZZ
Re: Can't wget anidb.net
Hi, * zanzi ([EMAIL PROTECTED]) wrote: I have tried to wget http://anidb.net/perl-bin/animedb.pl?show=main but all I seem to get is a file with unreadable characters (and not the HTML file I'm after). Is it because of some perl-script on the site? This perl script assume HTTP/1.1 and gzip support for any request :( HTTP/1.1 200 OK Date: Mon, 20 Oct 2008 15:25:09 GMT Server: Apache/1.3.41 (Unix) mod_perl/1.30 Set-Cookie: adbuin=1224516309-mqMQ; path=/; expires=Thu, 18-Oct-2018 15:25:09 GMT Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=UTF-8 Expires: Mon, 20 Oct 2008 15:25:09 GMT X-Cache: MISS from anidb.net Connection: close Content-Encoding: gzip ^ Content-Length: 5489 You can manually decompress the data: $ wget http://anidb.net/perl-bin/animedb.pl?show=main; -O page.gz $ gzip -d page.gz page.html Sincerly, Saint Xavier.
Special Website / Software One On One Personalized Consultancy
Sir/ Madam, We would like to offer you a F R E E one hour personalized consultancy on how best the Internet can help your buiness (in terms of website designing, software development and internet marketing). As part of this promotional campaign, one of our senior marketing managers will be specifically understanding Y O U R business and online/ software setup. He/ she will then set up a meeting with you to recommend on the B E S T way that the Internet and web based software can help your business. To make the most of this unique offer, register N O W at: http://www.pegasusinfocorp.com/contact/promotions_consultancy.htm This offer is for a limited period and is being sent to a representative sample. The offer will expire on October 15, 2008 Pegasus InfoCorp is a leading website, web based software development and internet marketing company head quartered in India that builds customised websites and software solutions for clients worldwide. 80% of our clients are small to mid sized businesses across more than 15 countries worldwide, and we also work with Fortune 500 blue chip companies such as eBay.com and Yahoo.com! We have delivered on over a 100 clients over the years and over 75% of our company revenues come from repeat/ referential clients. Many of our associates come from some of India's premium engineering and design institutes, including the Indian Institutes of Technology (IITs) and the J J School of Arts. And we have well set reliable processes for offshore delivery. To know more about us and read about some of the work we have done, please visit: www.pegasusinfocorp.com Register for this F R E E one hour consultancy at: http://www.pegasusinfocorp.com/contact/promotions_consultancy.htm Best regards, Pegasus InfoCorp, www.pegasusinfocorp.com USA (voicemail): +1-425-906-5727 ; UK (voicemail): +44-20-3129-8455 ; Australia (voicemail): +61-2-8005-6455 India: +91-22-32961777, +91-22-28941595, +91-22-65286140 You can unsubscribe from future promotions anytime by visiting: http://www.pegasusinfocorp.com/contact/email_preferences.htm Message sent by: Pegasus InfoCorp Pvt Ltd, 602, Soni Shopping Center, Borivali (W), Mumbai (Bombay), 400092, India
-c option
Hi, I've just come across the following remark in the wget manual page (1.10.2), about the -c option: Wget has no way of verifying that the local file is really a valid prefix of the remote file. This is not quite true. It could at least check the remote and local file time stamps for this purpose, and I think it should do this. It could also, as an option, load a couple of random bytes as a heuristic quick check. (I wouldn't do this, though.) In any case, the wrong claim no way should be removed from the man page. Best regards, Thomas Wolff
Re: wget re-download fully downloaded files
I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. Yours faithfully, Maksim Ivanov 2008/10/13 Micah Cowan [EMAIL PROTECTED] -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: Hello! Starting version 1.10 wget has very annoying bug: if you trying download already fully downloaded file, wget begin download it over, but 1.9.1 says: Nothing to do as it must to be. It all depends on what options you specify. That's as true for 1.9 as it is for 1.10 (or the current release 1.11.4). It can also depend on the server; not all of them support timestamping or partial fetches. Please post the minimal log that exhibits the problem you're experiencing. - -- Thanks, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K nZYI8ER1PB3pkYC4neiTa9U= =JW3/ -END PGP SIGNATURE- log.1.9.1 Description: Binary data log.1.10.2 Description: Binary data
Re: wget re-download fully downloaded files
I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. Yours faithfully, Maksim Ivanov 2008/10/13 Micah Cowan [EMAIL PROTECTED] -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: Hello! Starting version 1.10 wget has very annoying bug: if you trying download already fully downloaded file, wget begin download it over, but 1.9.1 says: Nothing to do as it must to be. It all depends on what options you specify. That's as true for 1.9 as it is for 1.10 (or the current release 1.11.4). It can also depend on the server; not all of them support timestamping or partial fetches. Please post the minimal log that exhibits the problem you're experiencing. - -- Thanks, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K nZYI8ER1PB3pkYC4neiTa9U= =JW3/ -END PGP SIGNATURE- log.1.9.1 Description: Binary data log.1.10.2 Description: Binary data
-m alias
Hi, Considering the -m switch (--mirror): the man page says it is currently equivalent to -r -N -l inf --no-remove-listing. I was wondering, though: why does this not also include -k? When mirroring a website it seems useful to convert the links for appropriate viewing in a browser. That is, if mirroring here means what it usually means: provide an alternative location to view the same content.. if it's more like a backup, then of course -k is not a good option. But in that case, maybe it's worth mentioning...? Thanks, Hraban PS: I would like to be CC'ed (not subscribed).
wget re-download fully downloaded files
Hello! Starting version 1.10 wget has very annoying bug: if you trying download already fully downloaded file, wget begin download it over, but 1.9.1 says: Nothing to do as it must to be. Yours faithfully, Maksim Ivanov
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: Hello! Starting version 1.10 wget has very annoying bug: if you trying download already fully downloaded file, wget begin download it over, but 1.9.1 says: Nothing to do as it must to be. It all depends on what options you specify. That's as true for 1.9 as it is for 1.10 (or the current release 1.11.4). It can also depend on the server; not all of them support timestamping or partial fetches. Please post the minimal log that exhibits the problem you're experiencing. - -- Thanks, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K nZYI8ER1PB3pkYC4neiTa9U= =JW3/ -END PGP SIGNATURE-
Incorrect transformation of newline's symbols
Hello! I've noticed some posible mistake in ftp-basic.c. When I try to download a file from ftp://www.delorie.com/pub/djgpp/current/; (in my case it was ftp://www.delorie.com/pub/djgpp/current/FILES;) server responce error no.550. But this file actually exists. I've used (wget --verbose --debug --output-file=wget_djgpp_log --directory-prefix=djgpp ftp://www.delorie.com/pub/djgpp/current/FILES;) cygwin command to get this file. In function ftp_request (ftp-basic.c) newline's characters are substituted on ' ', but ftp-server doesn't understand such commands. SIZE and RETR commands do not pass. I've insert debug log at the end of this message. --restrict-file-names=[windows,unix] option brings no effect. Yours faithfully, Alexander Vilnin ([EMAIL PROTECTED]) + wget_djgpp_log +++ DEBUG output created by Wget 1.11.3 on cygwin. --2008-10-06 17:06:43-- ftp://www.delorie.com/pub/djgpp/current/FILES%0D = `djgpp/FILES%0D' Resolving www.delorie.com... 207.22.48.162 Caching www.delorie.com = 207.22.48.162 Connecting to www.delorie.com|207.22.48.162|:21... connected. Created socket 4. Releasing 0x006a0c88 (new refcount 1). Logging in as anonymous ... 220 delorie.com FTP server (Version wu-2.8.0-prerelease(2) Fri Sep 5 11:24:18 EDT 2003) ready. -- USER anonymous 331 Guest login ok, send your complete e-mail address as password. -- PASS -wget@ 230 Guest login ok, access restrictions apply. Logged in! == SYST ... -- SYST 215 UNIX Type: L8 done.== PWD ... -- PWD 257 / is current directory. done. == TYPE I ... -- TYPE I 200 Type set to I. done. changing working directory Prepended initial PWD to relative path: pwd: '/' old: 'pub/djgpp/current' new: '/pub/djgpp/current' == CWD /pub/djgpp/current ... -- CWD /pub/djgpp/current 250 CWD command successful. done. == SIZE FILES\015 ... Detected newlines in SIZE FILES\015; changing to SIZE FILES -- SIZE FILES 550 FILES : not a plain file. done. == PASV ... -- PASV 227 Entering Passive Mode (207,22,48,162,102,137) trying to connect to 207.22.48.162 port 26249 Created socket 5. done.== RETR FILES\015 ... Detected newlines in RETR FILES\015; changing to RETR FILES -- RETR FILES 550 FILES : No such file or directory. No such file `FILES\015'. Closed fd 5 Closed fd 4 + wget_djgpp_log +++
Re: Incorrect transformation of newline's symbols
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Александр Вильнин wrote: Hello! I've noticed some posible mistake in ftp-basic.c. When I try to download a file from ftp://www.delorie.com/pub/djgpp/current/; (in my case it was ftp://www.delorie.com/pub/djgpp/current/FILES;) server responce error no.550. But this file actually exists. I've used (wget --verbose --debug --output-file=wget_djgpp_log --directory-prefix=djgpp ftp://www.delorie.com/pub/djgpp/current/FILES;) cygwin command to get this file. In function ftp_request (ftp-basic.c) newline's characters are substituted on ' ', but ftp-server doesn't understand such commands. SIZE and RETR commands do not pass. I've insert debug log at the end of this message. The problem isn't that newlines are substituted. Newlines and carriage returns are simply not safe within FTP file names. However, how did the newline get there in the first place? The real file name itself doesn't have a newline in it. The logs clearly show that Wget was passed a URL with a carriage return (not newline) in it. This strongly indicates that the shell you were using passed it that way to Wget. Probably, the shell was given \r\n when you hit Enter to end your command, and stripped away the \n but left the \r, which it passed to Wget. The bug you are encountering is in your Cygwin+shell environment; you'll have to look to there. The only deficiency I'm seeing on Wget's part from these logs, is that it's calling \015 a newline character, when in fact the newline character is \012; it should say line-ending character or some such. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI67fs7M8hyUobTrERArlfAJ0TurMdyGK0YR9UK263h8p2ZesqXQCfdQo3 Tn4oDFWJg9JIyTEQOJ2jrCE= =Y/Sy -END PGP SIGNATURE-
Can't fetch error messages
Hi, I was trying to test the error messages of my server (apache: ErrorDocument 404 ) but unfortunately could not download those error messages generated by my server with wget since the server sends a 404 code if a page is missing (that's what I wanted to test), wget does not save the page then. so wget should have an option to download and save in any case even when an error code was sent. regards Hadmut
Failure to build from Mercurial
While working on https://savannah.gnu.org/bugs/?24346 I found that the current code in Mercurial fails to build. This is what I am getting: $ hg clone http://hg.addictivecode.org/wget/mainline wget $ ./autogen.sh $ ./configure --prefix=$HOME $ make [...] /bin/sh ../ylwrap css.l lex.yy.c css.c -- flex /u/debray/devel/wget/hg/wget-hacking/src/css.l:112: undefined definition {X} /u/debray/devel/wget/hg/wget-hacking/src/css.l:113: undefined definition {X} /u/debray/devel/wget/hg/wget-hacking/src/css.l:120: undefined definition {R} /u/debray/devel/wget/hg/wget-hacking/src/css.l:121: undefined definition {R} make[2]: *** [css.c] Error 1 [...] Happy hacking, Debarshi
Re: Support for file://
Michelle Konzack napsal(a): Am 2008-09-20 22:05:35, schrieb Micah Cowan: I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. I mean, IF you have a local (master) mirror and your website @ISP and you want to know, whether the two websites are identical and have no cruft in it, you can I didn't follow this thread, however, just FYI, there exist excellent (not only) FTP client called lftp that has built-in command mirror. The command has similar effect as rsync tool---i.e. it synchronize remote and local directories recursively. -- Petr signature.asc Description: OpenPGP digital signature
Re: Support for file://
Am 2008-09-20 22:05:35, schrieb Micah Cowan: I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. I mean, IF you have a local (master) mirror and your website @ISP and you want to know, whether the two websites are identical and have no cruft in it, you can 1) fetch the website from your isp recursively with wget -r -nH -R /tmp/tmp_ISP http://website.isp.tld/ 2) fetch the local mirror with wget -r -nH -R /tmp/tmp_LOC file://path/to/local/mirror/ where the full path in 2) would be the same as the website in 1) and then compare it with 3) /path/to/local/mirror/ If you have edited the files local and remote, you can get surprising results. Fetching recursive of /index.html mean, that ALL files are downloaded which are mentioned in ANY HTML files. So if 1) differs from ftp://website.isp.tld/ then there is something wrong in the site... Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Suggested feature
On Wed, 24 Sep 2008, Oliver Hahn wrote: I think it would be a nice feature if wget could print in --spider mode all downloadable file urls into a text file, so that you can import this urls to another download manager. You can use the log file to retrieve this information from -- use the usual text processing tools like `grep', `sed', etc. to filter out what you need. No need for a new feature as all you need is already in place. Maciej
Re: Big files
Am 2008-09-16 15:22:22, schrieb Cristián Serpell: It is the latest Ubuntu's distribution, that still comes with the old version. Ehm, even Debian Etch comes with: [EMAIL PROTECTED]:~] apt-cache policy wget wget: Installiert:1.10.2-2 Mögliche Pakete:1.10.2-2 Versions-Tabelle: *** 1.10.2-2 0 500 file: etch/main Packages 100 /var/lib/dpkg/status So Ubunti use AFAIK the latest version which is 1.11... Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Big files
There must be an other Bug, since I can download small (:-) 18 GByte of archive files... Debian Etch: [EMAIL PROTECTED]:~] apt-cache policy wget wget: Installiert:1.10.2-2 Mögliche Pakete:1.10.2-2 Versions-Tabelle: *** 1.10.2-2 0 500 file: etch/main Packages 100 /var/lib/dpkg/status Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Big files
Am 2008-09-16 12:52:16, schrieb Tony Lewis: Cristián Serpell wrote: Maybe I should have started by this (I had to change the name of the file shown): [snip] ---response begin--- HTTP/1.1 200 OK Date: Tue, 16 Sep 2008 19:37:46 GMT Server: Apache Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT ETag: 7f710a-8a8e1bf7-47fbd2ef Accept-Ranges: bytes Content-Length: -1970398217 Interesting Headrs, since here, I get HTTP/1.1 200 OK Date: Mon, 22 Sep 2008 21:58:11 GMT Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch10 X-Powered-By: PHP/5.2.0-8+etch10 which mean, he is running the old crapy apache 1.3. The problem is not with wget. It's with the Apache server, which told wget that the file had a negative length. Because it is the old indian. Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Support for file://
Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file://. Here is what I wrote then: At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the file:// scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like: wget.exe -m -np -k file:///source/index.htm this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows, if the local source file is c:\test.htm, then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command). One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran wget.exe -m -np -k file://../../source/index.htm this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm) -David. At 08:49 AM 3/09/2008, you wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: Hi, I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it Needs Discussion and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which groks urls, Wget W(eb)-gets, and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system cp command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system cp command, but I might conceivably not mind file:// support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what recursion should mean for file://. Between ftp:// and http://, recursion currently means different things. In FTP, it means traverse the file hierarchy recursively, whereas in HTTP it means traverse links recursively. I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE- Make the switch to the world#39;s best email. Get Yahoo!7 Mail! http://au.yahoo.com/y7mail
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David wrote: Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file:// file:///. Here is what I wrote then: At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the file:// scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Fair enough. This example at least makes sense to me. I suppose it can't hurt to provide this, so long as we document clearly that it is not a replacement for cp or rsync, and is never intended to be (won't handle attributes and special file properties). However, support for file:// will introduce security issues, care is needed. For instance, file:// should never be respected when it comes from the web. Even on the local machine, it could be problematic to use it on files writable by other users (as they can then craft links to download privileged files with upgraded permissions). Perhaps files that are only readable for root should always be skipped, or wget should require a --force sort of option if the current mode can result in more permissive settings on the downloaded file. Perhaps it would be wise to make this a configurable option. It might also be prudent to enable an option for file:// to be disallowed for root. https://savannah.gnu.org/bugs/?24347 If any of you can think of additional security issues that will need consideration, please add them in comments to the report. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI19aE7M8hyUobTrERAt49AJ4irLGMd6OVRWeooKPqZxmX0+K2agCfaq2d Mx9IgSo5oUDQgBPD01mcGcY= =sdAZ -END PGP SIGNATURE-
Post size limit?
Hi! I've been trying to send post variables with --post-file option of wget. (I have two variables in the file, both urlencoded, one of them is quite large.) It worked fine until it came across a file that was 4.7M in size: post variables just won't get through to the server... I tried to do the same post with Mozilla Firefox, and it worked fine, but I had the same results with curl :-( Any ideas what could be the problem? Please cc me, I'm not subscribed! Thanks! Bye DeVill
Re: Post size limit?
Hi what is the server log, I guess a boundary problem your headers are wrong that's all I pretty sure if you look at the server error logs you will get your answer, post files are not really post data... you have to set up your http body correctly Cheers! On Sun, Sep 21, 2008 at 1:10 PM, DeVill [EMAIL PROTECTED] wrote: Hi! I've been trying to send post variables with --post-file option of wget. (I have two variables in the file, both urlencoded, one of them is quite large.) It worked fine until it came across a file that was 4.7M in size: post variables just won't get through to the server... I tried to do the same post with Mozilla Firefox, and it worked fine, but I had the same results with curl :-( Any ideas what could be the problem? Please cc me, I'm not subscribed! Thanks! Bye DeVill -- -mmw
Re: Support for file://
Hello Micah, Am 2008-09-02 15:49:15, schrieb Micah Cowan: I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which groks urls, Wget W(eb)-gets, and file:// can't really be argued to be part of the web. Right but... That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system cp command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system cp command, but I might conceivably not mind file:// support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what recursion should mean for file://. Between ftp:// and http://, recursion currently means different things. In FTP, it means traverse the file hierarchy recursively, whereas in HTTP it means traverse links recursively. I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. Imagine you have a local mirror of your website and you want to know why the site @HOSTINGPROVIDER has some files more or such. You can spider the website @HOSTINGPROVIDER recursiv in a local tmp1 directory and then, with the same commandline, you can do the same with the local mirror and download the files recursive into tmp2 and now you and now you can make a recursive fs-diff and know which files are used... on both, the local mirror and @HOSTINGPROVIDER I was searching such feature several times and currently the only way is to install a Webserver local which not always possibel. Maybe this is a discussion worth? Greetings Michelle -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ signature.pgp Description: Digital signature
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michelle Konzack wrote: Imagine you have a local mirror of your website and you want to know why the site @HOSTINGPROVIDER has some files more or such. You can spider the website @HOSTINGPROVIDER recursiv in a local tmp1 directory and then, with the same commandline, you can do the same with the local mirror and download the files recursive into tmp2 and now you and now you can make a recursive fs-diff and know which files are used... on both, the local mirror and @HOSTINGPROVIDER I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI1dYe7M8hyUobTrERAuuyAJ9m3ArCqxG4orhAQuEM010yWv6ScwCfaE9h jXIjJ+XUjBYwyBdi8NB/rEY= =NDnR -END PGP SIGNATURE-
Re: Problem with libeay32.dll, ordinal 2253
On Wed, Sep 17, 2008 at 11:02 PM, Tobias Opialla [EMAIL PROTECTED] wrote: Hey all, I hope this is the right adress, and you can help me. I'm currently trying to run a perlscript including some wget commands, but if I try to run it, it says: The ordinal 2253 could not be located in the dynamic link library LIBEAY32.dll. Probably because of dll conflict between the version used by wget and the version supplied by perl. You could try renaming libeay32.dll found in perl/bin directory.
Problem with libeay32.dll, ordinal 2253
Hey all, I hope this is the right adress, and you can help me. I'm currently trying to run a perlscript including some wget commands, but if I try to run it, it says: The ordinal 2253 could not be located in the dynamic link library LIBEAY32.dll. Any Ideas on that one? I couldn't find anythin on the web. Regards, Tobias Opialla
Big files
Hi I would like to know if there is a reason for using a signed int for the length of the files to download. The thing is that I was trying to download a 2.3 GB file using wget, but then the length was printed as a negative number and wget said Aborted. Is it a bug or a design decision? Is there an option for downloading big files? In this case, I used curl. Please CC replies, I'm not a suscriber Thanks! C S
Re: Big files
Tue, 16 Sep 2008 11:19:50 -0400, Cristián Serpell [EMAIL PROTECTED] : I would like to know if there is a reason for using a signed int for the length of the files to download. The thing is that I was trying to download a 2.3 GB file using wget, but then the length was printed as a negative number and wget said Aborted. Is it a bug or a design decision? Which version of wget are you using? It was a bug of older wget versions. You can see it with the output of wget --version command (latest version is 1.11.4). I'm not having any trouble with downloading files bigger than 2G. Doruk -- FISEK INSTITUTE - http://www.fisek.org.tr
RE: Big files
Cristián Serpell wrote: I would like to know if there is a reason for using a signed int for the length of the files to download. I would like to know why people still complain about bugs that were fixed three years ago. (More accurately, it was a design flaw that originated from a time when no computer OS supported files that big, but regardless of what you call it, the change to wget was made to version 1.10 in 2005.) Tony
Re: Big files
It is the latest Ubuntu's distribution, that still comes with the old version. Thanks anyway, that was the problem. El 16-09-2008, a las 15:08, Tony Lewis escribió: Cristián Serpell wrote: I would like to know if there is a reason for using a signed int for the length of the files to download. I would like to know why people still complain about bugs that were fixed three years ago. (More accurately, it was a design flaw that originated from a time when no computer OS supported files that big, but regardless of what you call it, the change to wget was made to version 1.10 in 2005.) Tony
Re: Big files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Cristián Serpell wrote: It is the latest Ubuntu's distribution, that still comes with the old version. Thanks anyway, that was the problem. I know that's untrue. Ubuntu comes with 1.10.2 at least, and has for quite some time. If you're using that, then it's probably a different bug than Doruk and Tony were thinking of (perhaps one of the cases of content-length mishandling that were recently fixed in the 1.11.x series). IIRC Intrepid Ibex (Ubuntu 8.10) will have 1.11.4. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI0AnI7M8hyUobTrERAqptAJoCj0VC46dBOhrr/A3HsHyicciKWQCffyFQ bHhmuYHmf52Yz1M5lu7Yk5Y= =Z+fN -END PGP SIGNATURE-
Re: Big files
Maybe I should have started by this (I had to change the name of the file shown): [EMAIL PROTECTED]:/tmp# wget --version GNU Wget 1.10.2 Copyright (C) 2005 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. [EMAIL PROTECTED]:/tmp# wget --debug http://program-linux64.tar.bz2 DEBUG output created by Wget 1.10.2 on linux-gnu. --15:37:42-- http://program-linux64.tar.bz2 = `program.tar.bz2' Resolving www.ai.sri.com... 130.107.65.215 Caching www.ai.sri.com = 130.107.65.215 Connecting to www.ai.sri.com|130.107.65.215|:80... connected. Created socket 3. Releasing 0x0064a100 (new refcount 1). ---request begin--- GET /program-linux64.tar.bz2 HTTP/1.0 User-Agent: Wget/1.10.2 Accept: */* Host: www.ai.sri.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Tue, 16 Sep 2008 19:37:46 GMT Server: Apache Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT ETag: 7f710a-8a8e1bf7-47fbd2ef Accept-Ranges: bytes Content-Length: -1970398217 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: application/x-tar ---response end--- 200 OK Registered socket 3 for persistent reuse. Length: -1,970,398,217 [application/x-tar] [ =] 0 --.--K/s Aborted El 16-09-2008, a las 15:32, Micah Cowan escribió: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Cristián Serpell wrote: It is the latest Ubuntu's distribution, that still comes with the old version. Thanks anyway, that was the problem. I know that's untrue. Ubuntu comes with 1.10.2 at least, and has for quite some time. If you're using that, then it's probably a different bug than Doruk and Tony were thinking of (perhaps one of the cases of content-length mishandling that were recently fixed in the 1.11.x series). IIRC Intrepid Ibex (Ubuntu 8.10) will have 1.11.4. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI0AnI7M8hyUobTrERAqptAJoCj0VC46dBOhrr/A3HsHyicciKWQCffyFQ bHhmuYHmf52Yz1M5lu7Yk5Y= =Z+fN -END PGP SIGNATURE-
RE: Big files
Cristián Serpell wrote: Maybe I should have started by this (I had to change the name of the file shown): [snip] ---response begin--- HTTP/1.1 200 OK Date: Tue, 16 Sep 2008 19:37:46 GMT Server: Apache Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT ETag: 7f710a-8a8e1bf7-47fbd2ef Accept-Ranges: bytes Content-Length: -1970398217 The problem is not with wget. It's with the Apache server, which told wget that the file had a negative length. Tony
Re: Hiding passwords found in redirect URLs
Micah Cowan wrote: Note: Saint Xavier has already written a fix for this, so it's not actually a question of whether it's worth the bother, just whether it's actually desired behavior. Since it's desired in some situations but maybe not in others, the best solution would be to provide a switch for it that can be used in a user's .wgetrc and on the command line. Now we only need to find out what's the desired default behaviour if the switch is missing. ;-) Thomas Corthals
Re: Hiding passwords found in redirect URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thomas Corthals wrote: Micah Cowan wrote: Note: Saint Xavier has already written a fix for this, so it's not actually a question of whether it's worth the bother, just whether it's actually desired behavior. Since it's desired in some situations but maybe not in others, the best solution would be to provide a switch for it that can be used in a user's .wgetrc and on the command line. Well, yes, except I can't really imagining anyone ever _using_ such a switch. Though I could envision people using the .wgetrc option. Still seems like a lot of trouble to make a new option for such a little thing. One could always use -nv in a pinch. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIzBiU7M8hyUobTrERAkchAJ9vajvughHFXR8yAJPPGt4YkaGY8ACfYXCR vPCAZaYsRN6VcisBjDkmdzI= =wMVt -END PGP SIGNATURE-
A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On expanding current URI acc/rej matches to allow matching against query strings, I've been considering how we might enable/disable this functionality, with an eye toward backwards compatibility. It seems to me that one usable approach would be to require the ? query string to be an explicit part of rule, if it's expected to be matched against query strings. So -A .htm,.gif,*Action=edit* would all result in matches against the filename portion only, but -A '\?*Action=edit*' would look for Action=edit within the query-string portion. (The '\?' is necessary because otherwise '?' is a wildcard character; [?] would also work.) The disadvantage of that technique is that it's harder to specify that a given string should be checked _anywhere_, regardless of whether it falls in the filename or query-string portion; but I can't think offhand of any realistic cases where that's actually useful. We could also supply a --match-queries option to turn on matching of wildcard rules for anywhere (non-wildcard suffix rules should still match only at the end of the filename portion). Another option is to use a separate -A-like option that does what -A does for filenames, but matches against query strings. I like this idea somewhat less. Thoughts? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIyrXz7M8hyUobTrERAk+5AJ0ckiE4+bEMEFe9aD8bBNY3HH+IZACdERCs wab0TyBLCbW/6DYm+8gAExM= =pwb/ -END PGP SIGNATURE-
Hiding passwords found in redirect URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 https://savannah.gnu.org/bugs/index.php?21089 The report originator is copied in the recipients list for this message. The situation is as follows: the user types wget http://foo.com/file-i-want;. Wget asks the HTTP server for the appropriate file, and gets a 302 redirection to the URL ftp://spag:[EMAIL PROTECTED]. Wget will then issue to the log output, the line: Location: ftp://spag:[EMAIL PROTECTED]/mickie/file-you-want with the password in plain view. I'm uncertain that this is actually a problem. In this specific case, it's a publicly-accessible URL redirecting to a password-protected file. What's to hide, really? Of course, the case gets more interesting when it's _not_ a publicly-accessible URL. What about when the password is generated from one the user supplied? That is, the original request was http://spag:[EMAIL PROTECTED]/file-i-want, which resulted in a redirect using the same username/password? Especially if it was an HTTPS request rather than plain HTTP. A case could be made that it should be hidden in that case. On the other hands, in cases like the _original_ example given above, I'd argue that hiding it could be the wrong thing: the user now has no idea how to directly access the file, avoiding the redirect the next time around. Redirecting to a password-protected file on a different host or using a different scheme seems broken to me in the first place, and I'm sorta leaning towards not bothering about it. What are your thoughts, list? Note: Saint Xavier has already written a fix for this, so it's not actually a question of whether it's worth the bother, just whether it's actually desired behavior. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIytyT7M8hyUobTrERAnC1AJ4pRpWx7z6wRt3Vg4LHyQalEfL3XQCdGTqg LdK8lQ8tuPTlmCfURcjXPw4= =ZPrY -END PGP SIGNATURE-
small doc typo in 9.1 Robot Exclusion
9.1 Robot Exclusion .. . Although Wget is not a web robot in the strictest sense of the word, it can downloads large parts of the site without the user's... .. . possibly meant: ...it can download large cheers michael
Re: Wget and Yahoo login?
And you'll probably have to do this again- I bet yahoo expires the session cookies! On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote: After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE- -- Best Regards. Please keep in touch. This is unedited. P-)
Re: Wget and Yahoo login?
On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. -- / daniel.haxx.se
Missing asprintf()
Why the need for asprintf() in url.c:903? This function is missing on DOS/Win32 and nowhere to be found in ./lib. I suggest we replace with this: --- hg-latest/src/url.c Tue Sep 09 12:37:23 2008 +++ url.c Tue Sep 09 13:01:33 2008 @@ -893,16 +893,18 @@ if (error_code == PE_UNSUPPORTED_SCHEME) { - char *error, *p; + char *p; char *scheme = xstrdup (url); + static char error[100]; + assert (url_has_scheme (url)); if ((p = strchr (scheme, ':'))) *p = '\0'; if (!strcasecmp (scheme, https)) -asprintf (error, _(HTTPS support not compiled in)); +sprintf (error, _(HTTPS support not compiled in)); else -asprintf (error, _(parse_errors[error_code]), quote (scheme)); +sprintf (error, _(parse_errors[error_code]), quote (scheme)); xfree (scheme); return error; --- Here 'error' is guaranteed to be big enough. --gv
Where is program_name?
'program_name' is used in lib/error.c, but it is not allocated anywhere. Should it be added to main.c and initialised to exec_name? --gv
Re: Missing asprintf()
Gisle Vanem [EMAIL PROTECTED] writes: Why the need for asprintf() in url.c:903? This function is missing on DOS/Win32 and nowhere to be found in ./lib. Wget is supposed to use aprintf, which is defined in utils.c, and is not specific to Unix. It's preferable to use an asprintf-like functions than a static buffer because it supports reentrance (unlike a static buffer) and imposes no arbitrary limits on error output.
Re: Missing asprintf()
Hrvoje Niksic [EMAIL PROTECTED] wrote: Wget is supposed to use aprintf, which is defined in utils.c, and is not specific to Unix. It's preferable to use an asprintf-like functions than a static buffer because it supports reentrance (unlike a static buffer) and imposes no arbitrary limits on error output. Fine by me. Here is an adjusted patch: --- hg-latest/src/url.c Tue Sep 09 12:37:23 2008 +++ url.c Tue Sep 09 14:37:39 2008 @@ -900,9 +900,9 @@ if ((p = strchr (scheme, ':'))) *p = '\0'; if (!strcasecmp (scheme, https)) -asprintf (error, _(HTTPS support not compiled in)); +error = aprintf (_(HTTPS support not compiled in)); else -asprintf (error, _(parse_errors[error_code]), quote (scheme)); +error =aprintf (_(parse_errors[error_code]), quote (scheme)); xfree (scheme); return error; - --gv
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. -- / daniel.haxx.se
Re: Where is program_name?
Hi, * Gisle Vanem ([EMAIL PROTECTED]) wrote: 'program_name' is used in lib/error.c, but it is not allocated anywhere. Should it be added to main.c and initialised to exec_name? $cd wget-mainline $find . -name '*.[ch]' -exec fgrep -H -n 'program_name' '{}' \; ./lib/error.c:63:# define program_name program_invocation_name ^^^ ./lib/error.c:95:/* The calling program should define program_name and set it to the ./lib/error.c:97:extern char *program_name; ./lib/error.c:248: __fxprintf (NULL, %s: , program_name); ./lib/error.c:250: fprintf (stderr, %s: , program_name); ./lib/error.c:307: __fxprintf (NULL, %s:, program_name); ./lib/error.c:309: fprintf (stderr, %s:, program_name); ./src/netrc.c:463: char *program_name, *file, *target; ./src/netrc.c:472: program_name = argv[0]; Google for that and you will find the corresponding man page. Like it's written here http://www.tin.org/bin/man.cgi?section=3topic=PROGRAM_INVOCATION_NAME These variables are automatically initialised by the glibc run-time startup code. I've also opened Wget with GDB: the variable exists but seems to point to a bad memory area... Sincerly, Saint Xavier.
Re: Where is program_name?
Google for that and you will find the corresponding man page. Like it's written here http://www.tin.org/bin/man.cgi?section=3topic=PROGRAM_INVOCATION_NAME These variables are automatically initialised by the glibc run-time startup code. I'm on Windows. So glibc is of no help here. --gv
Re: Where is program_name?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Saint Xavier wrote: Hi, * Gisle Vanem ([EMAIL PROTECTED]) wrote: 'program_name' is used in lib/error.c, but it is not allocated anywhere. Should it be added to main.c and initialised to exec_name? $cd wget-mainline $find . -name '*.[ch]' -exec fgrep -H -n 'program_name' '{}' \; ./lib/error.c:63:# define program_name program_invocation_name ^^^ ./lib/error.c:95:/* The calling program should define program_name and set it to the ^^^ Looks to me like we're expected to supply it. Line 63 is only evaluated when we're using glibc; otherwise, we need to provide it. The differing name is probably so we can define it unconditionally. It appears that lib/error.c isn't even _built_ on my system, perhaps because glibc supplies what it would fill in. This makes testing a little dificult. Anyway, see if this fixes your trouble: diff -r 0c2e02c4f4f3 src/ChangeLog - --- a/src/ChangeLog Tue Sep 09 09:29:50 2008 -0700 +++ b/src/ChangeLog Tue Sep 09 09:40:00 2008 -0700 @@ -1,3 +1,7 @@ +2008-09-09 Micah Cowan [EMAIL PROTECTED] + + * main.c: Define program_name for lib/error.c. + 2008-09-02 Gisle Vanem [EMAIL PROTECTED] * mswindows.h: Must ensure stdio.h is included before diff -r 0c2e02c4f4f3 src/main.c - --- a/src/main.cTue Sep 09 09:29:50 2008 -0700 +++ b/src/main.cTue Sep 09 09:40:00 2008 -0700 @@ -826,6 +826,8 @@ exit (0); } +char *program_name; /* Needed by lib/error.c. */ + int main (int argc, char **argv) { @@ -833,6 +835,8 @@ int i, ret, longindex; int nurl, status; bool append_to_log = false; + + program_name = argv[0]; i18n_initialize (); - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqf67M8hyUobTrERAq0+AJ9KIOFDn9FiDXIIlU6M7DsupDmPYQCcDuoo 9bgAQnuKpgYMvnwc18svfYg= =DXYi -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch ). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo test page (after having logged in to yahoo with firefox, of course) with --load-cookies and the accept-encoding header item, with --debug. Very useful. wget is sending every cookie item in firefox's cookies.txt. But firefox sends three additional cookie items in the header that wget does not send. Those items are *not* in firefox's cookies.txt so wget has no way of knowing about them. Is it possible that firefox is not writing session cookies to the file? The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. /Don - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. I think we're mis-communicating, easily my fault, since I know just enough about this stuff to be dangerous. I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Hello, All and bug #21793
Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. -David A Coon
Re: Hello, All and bug #21793
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Coon wrote: Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. Hi David, and welcome! If you haven't already, please see http://wget.addictivecode.org/HelpingWithWget I'd encourage you to get a Savannah account, so I can assign that bug to you. Also, I tend to hang out quite a bit on IRC (#wget @ irc.freenode.net), so you might want to sign on there. Since you mentioned an interest in Unicode and UTF-8, you might want to check out Saint Xavier's recent work on IRI and iDNS support in Wget, which is available at http://hg.addictivecode.org/wget/sxav/. Among other things, sxav's additions make Wget more aware of the user's locale, so it might be useful for providing a feature to automatically transcode filenames to the user's locale, rather than just supporting UTF-8 only (which should still probably remain an explicit option). If that sounds like the direction you'd like to take it, you should probably base your work on sxav's repository, rather than mainline. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD veVZAIH2NjbYI8dG6DimjRg= =9Qau -END PGP SIGNATURE-
Wget and Yahoo login?
There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen
Re: Wget and Yahoo login?
2008/9/8 Tony Godshall [EMAIL PROTECTED]: I haven't done this but I can speculate that you need to have wget identify itself as firefox. When I read this, I thought it looked promising, but it doesn't work. I tried sending exactly the user-agent string firefox is sending and still got a page from yahoo that clearly indicates yahoo thinks I'm not logged in. /Don Quote from man wget... -U agent-string --user-agent=agent-string Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current ver‐ sion number of Wget. However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more fre‐ quently, Microsoft Internet Explorer. This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing. On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen -- Best Regards. Please keep in touch. This is unedited. P-)
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. Are you signing into the main Yahoo! site? When I try to do so, whether I use the cookies or no, I get a message about update your browser to something more modern or the like. The difference appears to be a combination of _both_ User-Agent (as you've done), _and_ --header Accept-Encodings: gzip,deflate. This plus appropriate cookies gets me a decent logged-in page, but of course it's gzip-compressed. Since Wget doesn't currently support gzip-decoding and the like, that makes the use of Wget in this situation cumbersome. Support for something like this probably won't be seen until 1.13 or 1.14, I'm afraid. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo obud0CjpATBYDvA0eS3ZHGY= =vv4R -END PGP SIGNATURE-
Internet Draft for Metalink XML Download Description Format (draft-bryan-metalink-02)
Greetings, The Internet Draft for Metalink is available at http://tools.ietf.org/html/draft-bryan-metalink-02 with interim revisions at http://metalinks.svn.sourceforge.net/viewvc/metalinks/internetdraft/ . We're looking for review and public comments. Metalink is currently supported by some 35 applications and used by projects such as OpenOffice.org, openSUSE, Ubuntu, cURL, and others. Metalink is an XML-based document format that describes a file or lists of files to be added to a download queue. Lists are composed of a number of files, each with an extensible set of attached metadata. For example, each file can have a description, checksum, and list of URIs that it is available from. The primary use case that Metalink addresses is the description of downloadable content in a format so download agents can act intelligently and recover from common errors with little or no user interaction necessary. These errors can include multiple servers going down and data corrupted in transmission. Example .metalink file: ?xml version=1.0 encoding=UTF-8? metalink xmlns=http://www.metalinker.org; published2008-05-15T12:23:23Z/published files file name=example.ext identityExample/identity version1.0/version descriptionA description of the example file for download./description verification hash type=md583b1a04f18d6782cfe0407edadac377f/hash hash type=sha-180bc95fd391772fa61c91ed68567f0980bb45fd9 /hash /verification resources urlftp://ftp.example.com/example.ext/url urlhttp://example.com/example.ext/url /resources /file /files /metalink Thank you, -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads
Re: [wget-notify] add a new option
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 houda hocine wrote: Hi, Hi houda. This message was sent to the wget-notify, which was not the proper forum. Wget-notify is reserved for bug-change and (previously) commit notifications, and is not intended for discussion (though I obviously haven't blocked discussions; the original intent was to be able to discuss commits, but I'm not sure I need to allow discussions any more, so it may be disallowed soon). The appropriate list would be wget@sunsite.dk, to which this discussion has been redirected. we create a new format for archiviving (. warc), and we want to ensure that wget generate directly this format from the input url . You can help me by some ideas to achieve this new option? The format is (warc -wget url) I am in the process of trying to understand the source code to add this new option. Which .c file fallows me to do this? Doing this is not likely to be a trivial undertaking: the current file-output interface isn't really abstracted enough to allow this, so basically you'll need to modify most of the existing .c files. We are hoping at some future point to allow for a more generic output format, for direct output to (for instance) tarballs and .mhtml archives. At that point, it'd probably be fairly easy to write extensions to do what you want. In the meantime, though, it'll be a pain in the butt. I can't really offer much help; the best way to understand the source is to read and explore it. However, on the general topic of adding new options to Wget, Tony Lewis has written the excellent guide at http://wget.addictivecode.org/OptionsHowto. Hope that helps! Please note that I won't likely be entertaining patches to Wget to make it output to non-mainstream archive formats, and even once generic output mechanisms are supported, the mainstream archive formats will most likely be supported as extension plugins or similar, and not as built-in support within Wget. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvbyf7M8hyUobTrERApl8AJwNvWOdDd0Z//wbNzN/jyZFqKI5iQCfQOx4 3zlxPGaVqjsPhwa7ZwB4wrs= =Zy+N -END PGP SIGNATURE-
Re: Checking out Wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: Hi all, I need to checkout the complete source into my local hard disk. I am using WinCVS when i searched for the module its saying that there is no module information out there. Could any one help me out i am a complete novice in this regard. WinCVS won't work, because there _is_ in fact no CVS module for Wget. Wget uses Mercurial as the source repository (and was using Subversion prior to that). For more information about the Wget source repository and its use, see http://wget.addictivecode.org/RepositoryAccess That page focuses on using the hg command-line tool; you may prefer to use TortoiseHg instead, http://tortoisehg.sourceforge.net/. The page does offer additional information about the repository and what is required to build from those sources. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb4n7M8hyUobTrERAnquAJ9ItMQH1QYgXvyYTI6/IZDScIFGoACfVlqd p+LMC9AK5/SwYPyuGVfd5Ns= =RmLO -END PGP SIGNATURE-
Re: [BUG:#20329] If-Modified-Since support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? That is acceptable. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1 AFkIYSyyyS4egbyXjzBLXBo= =fIT5 -END PGP SIGNATURE-
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes, that's what it means. I'm not yet committed to doing this. I'd like to see first how many mainstream servers will respect If-Modified-Since when given as part of an HTTP/1.0 request (in comparison to how they respond when it's part of an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not in HTTP/1.1, that'd be an excellent case for holding off until we're doing HTTP/1.1 requests. Also, I don't think removing the previous HEAD request code is entirely accurate: we probably would want to detect when a server is feeding us non-new content in response to If-Modified-Since, and adjust to use the current HEAD method instead as a fallback. - -Micah vinothkumar raman wrote: This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2 8JiRBKtEhmcK3schVVO347A= =yCJV -END PGP SIGNATURE-
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: Hi, I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it Needs Discussion and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which groks urls, Wget W(eb)-gets, and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system cp command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system cp command, but I might conceivably not mind file:// support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what recursion should mean for file://. Between ftp:// and http://, recursion currently means different things. In FTP, it means traverse the file hierarchy recursively, whereas in HTTP it means traverse links recursively. I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE-