using wget
Hi, I'm new to wget. I use the following command to get files from the site wget ftp://ftp.iitm.ac.in/debian and have set my .wgetrc file with the following tries=20 reclevel=5 passive_ftp = on http_proxy = http://anselm:[EMAIL PROTECTED]:9000/ ftp_proxy= http://anselm:[EMAIL PROTECTED]:9000/ # above password changed ! use_proxy = on dot_style = default noclobber = on glob=on continue = on robots = on no_parent = on wait = 60 dirstruct = on add_hostdir = off recursive = on follow_ftp = on simple_host_check=on add_hostdir=off 1.I have a have a very slow link to the Internet through a proxy. When the connection breaks, wget will go on to get the next file instead of trying for the same file (before the connection broke). eg. Connecting to darya.nio.org:9000... Connection to darya.nio.org:9000 refused. --12:36:14-- ftp://ftp.iitm.ac.in:21/debian/pool/main/libr = `debian/pool/main/libr' Connecting to darya.nio.org:9000... Connection to darya.nio.org:9000 refused. --12:37:14-- ftp://ftp.iitm.ac.in:21/debian/pool/main/libs = `debian/pool/main/libs' Connecting to darya.nio.org:9000... Connection to darya.nio.org:9000 refused. --12:38:14-- ftp://ftp.iitm.ac.in:21/debian/pool/main/libt = `debian/pool/main/libt' Connecting to darya.nio.org:9000... Connection to darya.nio.org:9000 refused. --12:39:14-- ftp://ftp.iitm.ac.in:21/debian/pool/main/libu = `debian/pool/main/libu' 2. Also, if the connection is bad 0 bytes are downloaded and wget goes on to download the next file and in the directory structure I get a lot of files with 0 bytes downloaded, whereas I would have expected wget to keep on trying to get the file till it is completely downloaded! please see enclosed file text3. 3.The third thing is that if I restart wget again to continue the download, after the connection is broken, it justs skips the files which have been partially downloaded and downloads only the new files which have not been downloaded. I suspect, I am not using wget's options properly and I will be grateful for any leads. Thanks in advance. I'm using wget 1.5.3. Anselm an/dists/woody/main/source/net/dhcpcd_1.3.17pl2.orig.tar.gz' Connecting to darya.nio.org:9000... connected! Proxy request sent, awaiting response... 200 document follows Length: unspecified [application/octet-stream] 0K - 10:07:10 (0.00 B/s) - `debian/dists/woody/main/source/net/dhcpcd_1.3.17pl2.orig.tar.gz' saved [0] --10:08:10-- ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz = `debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz' Connecting to darya.nio.org:9000... connected! Proxy request sent, awaiting response... 200 document follows Length: unspecified [application/octet-stream] 0K - 10:14:00 (0.00 B/s) - `debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz' saved [0] --10:15:00-- ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/epic_3.004.orig.tar.gz = `debian/dists/woody/main/source/net/epic_3.004.orig.tar.gz' Connecting to darya.nio.org:9000... connected! Proxy request sent, awaiting response... 500 Internal Error 10:15:20 ERROR 500: Internal Error. --10:16:20-- ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz = `debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz' Connecting to darya.nio.org:9000... connected! Proxy request sent, awaiting response... 200 document follows Length: unspecified [application/octet-stream] 0K - 10:21:44 (0.00 B/s) - `debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz' saved [0] --10:22:44-- ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/fmirror_0.8.4beta.orig.tar.gz = `debian/dists/woody/main/source/net/fmirror_0.8.4beta.orig.tar.gz' Connecting to darya.nio.org:9000... connected! Proxy request sent, awaiting response... 200 document follows Length: unspecified [application/octet-stream] 0K - .. .. ..
Re: mirroring vs -m
On Tue, Jan 29, 2002 at 04:54:17PM +0100, Andre Majorel wrote: On 2002-01-29 09:56 -0500, Alan Eldridge wrote: In particular, does wget parse and follow links in an HTML document, when that document is retrieved (using -r) via the FTP protocol? If not, why not? I'm inclined to think that recursive retrieval without parsing is a feature. HTML content is normally served over HTTP. If you want to retrieve HTML through FTP, it's likely because you do *not* want to follow the links. I agree with you. I'm not making a case for doing it, but merely bringing up that there are two different recursive retrieval models, one of which (FTP) is (IIRC) a true mirror. One of the properties of mirroring is that links are followed all the way down the tree; that is, the tree is not pruned (depth-wise) just because a node at a given level has not been changed. It seems to me that there are some abstractions that can be made about the retrieval process. In particular, there's: (1) the continuation protocol: does the retrieval continue to futher depth once an unmodified node is encountered? (2) the child-list acquisition protocol: do we get the list of children of this node by (a) examining metadata or (b) parsing the contents of the node as a document? IOW, recursive retrieval can be thought of as a generalized process, parameterized by protocols that determine that actions and state transitions at each node in the tree. Ideally, the code that implements recursive retrieval would know nothing about either the communication protocol (ftp vs. http) or the data/metadata formats (having to do a special operation - an ftp listing - on a directory node for ftp, vs having to do a special operation - an html parse - on a particular type of file node for an http traversal). -- Alan E Please rush me my portable walrus cleaning kit! Yes I am over 18, but my IQ isn't.
Can wget handle this scenario?
Hi, I have been happily using wget to handle automatic ftp download but now have a situation which I am not sure whether wget can handle. This is the type of synax that I have been using without any problems: $ wget ftp://username:[EMAIL PROTECTED]/file.txt Now I need to add the twist where username account info resides on another machine (i.e. machine2 which by the way is on the same network as machine1) So I need to do something like the following: $ wget ftp://username:[EMAIL PROTECTED]@machine1.com/file.txt which is of course not the syntax wget understands. But perhaps there are some wget flags or other wget magic I am not aware of (while looking at the 'wget --help') output that allows for this? I can do an interactive ftp session which allows me to solve the situation in the following way: $ ftp machine2.com ftp username: username@machine1 ftp password: passwd ftp The above sequence gets me in without problem. But I can't figure out whether there is some way to automate this (without resorting to writing expect script). I would really prefer to do this with either wget, ftp, pavuk, or curl but I don't know whether any of those clients can do this. Any info on this would be most helpfull. Please e-mail me with any bright ideas as I am not subscribed to the wget lists. Thanks, Tomislav Goles [EMAIL PROTECTED]
Re: mirroring vs -m
On 29/01/2002 15:54:17 Andre Majorel wrote: [snip debate about following links in HTML retrieved by FTP] I'm inclined to think that recursive retrieval without parsing is a feature. HTML content is normally served over HTTP. If you want to retrieve HTML through FTP, it's likely because you do *not* want to follow the links. I (client) don't get the choice. If the document at http://foo.bar/index.html has all its links like this: A HREF=ftp://foo.bar/welcome.html;welcome/A the client has no choice but to retrieve them via FTP. It would be nice if wget was able to follow all those links. If Wget always parsed HTML, even over FTP, it would be impossible to make a complete mirror a tree that has broken href links or hidden files. Perhaps If wget started with FTP, it should mirror FTP-like (.listing and all that). If it started via HTTP, it should follow links, regardless of future retrieval modes [snip] -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
[±¤°í]¿ª¼úÀλó´ã °áÈ¥? ÁÖ½ÄÅõÀÚ? °Ç°? Á÷Àå?
Á¤º¸Åë½ÅºÎ ±Ç°í »çÇ׿¡ ÀÇ°Å Á¦¸ñ¿¡ [±¤°í]¶ó°í Ç¥±âÇÑ ±¤°í ¸ÞÀÏÀÔ´Ï´Ù. ÀÛ³â ÇÑÇØ ÀÏÀÌ Á¦´ë·Î ¾ÊÇ®¸®¼Ì´øºÐ, ´ä´äÇÑ »ç¿¬ÀÌ ÀÖÀ¸¼Ì´øºÐ!! ¿ª¼úÀΰú »ó´ãÇغ¸½Ê½Ã¿ä ºÐ¸í ´äÀÌ ÀÖ½À´Ï´Ù ÅäÁ¤ºñ°á, ¼Ó±ÃÇÕ °Ñ±ÃÇÕ, »ç¾÷¿î, ½ÂÁø¿î, ÁøÇпî, °Ç°¿î, ÀÛ¸í, ÅÃÀÏ, º°ÀÚ¸®Á¡, dz¼öÁö¸®, ¾ÖÁ¤¿î, »ç¶û¿î ½ÅÅëÇÏ°Ô ¸ÂÃß´Â °ú°Å¿Í ¹Ì·¡, "±ô¦ ³î¶ó½Ç°Ì´Ï´Ù" ¢Ï 060-708-7081 "¿ª¼úÀº Åë°è¿¡ ÀÇÇÑ °úÇÐÀûÀÎ Çй®ÀÔ´Ï´Ù" ¡Ø ±ÍÇÏÀÇ ½Â¶ô ¾øÀÌ È«º¸¼º ¸ÞÀÏÀ» º¸³»°Ô µÈ Á¡ Á¤ÁßÈ÷ »ç°ú µå¸³´Ï´Ù. - º» ¸ÞÀÏÀº ÀÎÅͳݻóÀÇ ¸ÞÀÏÁÖ¼Ò¸¦ ¹ßÃéÇÏ¿© ¹ß¼ÛÇÏ¿´½À´Ï´Ù. - º» ¸ÞÀÏÀº Á¤º¸Åë½Å¸Á ÀÌ¿ëÃËÁø ¹× Á¤º¸º¸È£ µî¿¡ °üÇÑ ¹ý·ü Á¦ 50Á¶¿¡ ÀÇ°ÅÇÑ [±¤°í] ¸ÞÀϷμ ¿øÄ¡ ¾ÊÀ¸ ½Ã¸é »èÁ¦Çϰųª °ÅºÎ¸ÞÀÏÀ» º¸³»Áֽñ⠹ٶø´Ï´Ù. ¹öÆ°À» Ŭ¸¯ÇÏ½Ã¸é ¼ö½Å°ÅºÎ󸮰¡ ÀÌ·ç¾î Áý´Ï´Ù.
Re: Noise ratio getting a bit high?
If you have a spam-fighting suggestion that does *not* include disallowing non-subscriber postings, I am more than willing to listen. It's not spam fighting, but I would personally like to see a wget-announce moderator-only list where new releases and security announcements could be posted. That would prevent spam for me. -- Marc Stephenson IBM Server Group - Austin, TX Internet: [EMAIL PROTECTED] NOTES: [EMAIL PROTECTED] Phone: 512-327-5670 T/L 678-3189
Re: Noise ratio getting a bit high?
In message [EMAIL PROTECTED], Hrvoje Niksic said: James C. McMaster (Jim) [EMAIL PROTECTED] writes: In message [EMAIL PROTECTED], Thomas Reinke said: Is anyone else not finding the noise ratio (i.e. spam) a bit high here? I sympathize with the effort required to lightly moderate, but might I recommend that _something_ be done to rid us all of this spam? It's getting to be irritating enough that I'm tempted to drop off the list, which I'd just as soon not do - wget is a fantastic little tool that I'd just as soon stay involved with actively, if possible. The easiest solution would be for the list owners to require people to subscribe before posting. So far, they seem unwilling to do that. All the product-support lists to which I subscribe (except this one) have that policy, and I never get spam from any of them. I do not know what you call a product support mailing list, but this is a free software project development list, and certainly not the only one with the open posting policy. For example, XEmacs mailing lists are open to non-subscriber posting. Product is a generic term. I subscribe to mailing lists on apache, tomcat, exmh, nmh and procmail. All these packages are open-source products. All of these lists require subscription before posting. I receive spam from none of them. But that was just an example. The actual reasoning for allowing non-subscriber posting boils down to three reasons: 1. I believe it is the right thing to do. I personally hate allegedly supportive mailing lists that require me to subscribe before asking a question. I don't want to subscribe, dammit, I just want to ask something. Your call. Subscription and unsubscription are easy enough to do in my opinion. I personally think people who ask a question and then add, Please reply privately since I am not on the list are leeches if they want to use the list without giving anything back. If anyone just hits reply, the person will never see the answer. 2. It allows the discussion to extend to non-subscribers. You can simply Cc a person to a discussion pertinent to him, and he will be able to respond to the list. Again, if they are interested enough to contribute to the discussion they should be willing to subscribe. 3. It allows the mails from [EMAIL PROTECTED] to be rerouted to this list. Fine. Why bother with the bug list then? Also, the same problem applies as with this list. If a person just replies, the reporter will never see the response. I am aware that in this matter, as well as in the infamous `Reply-To' debate, this list lies in the minority. But that is not a sufficient reason to back down and let the spammers win. I disagree with you on the Reply-to matter as well, but that is not the argument. The point is not that your list in in the minority, it is *why* you are in the minority. The quantity of spam on this list has been annoying for awhile now. It is getting really tiresome now. Once the spammers get your address they sell it to other spammers, so the quantity will only increase from now on. Don't the spammers also win if they annoy enough of the knowledgeable people on this list that they leave? Even if they don't the people relying on that expertise surely lose. If you have a spam-fighting suggestion that does *not* include disallowing non-subscriber postings, I am more than willing to listen. The only alternative I can imagine is moderation. I doubt you or anyone else has the time or inclination. Personally I have rearranged my .procmailrc so this list gets processed after my spam filters. That leads to the risk I will miss some valid postings, but so be it. If that does not catch the spam, I will unsubscribe from the list. altogether. -- Jim McMaster mailto:[EMAIL PROTECTED]
Re: Noise ratio getting a bit high?
On 2002-01-29 22:02 +0100, Hrvoje Niksic wrote: But that was just an example. The actual reasoning for allowing non-subscriber posting boils down to three reasons: 1. I believe it is the right thing to do. I personally hate allegedly supportive mailing lists that require me to subscribe before asking a question. I don't want to subscribe, dammit, I just want to ask something. I respectfully disagree. If we can spend the time to read and answer the poster's question, the poster can spend five minutes to subscribe/unsubscribe. For reference, see the netiquette item on posting to newsgroups and asking for replies by email. 2. It allows the discussion to extend to non-subscribers. You can simply Cc a person to a discussion pertinent to him, and he will be able to respond to the list. 3. It allows the mails from [EMAIL PROTECTED] to be rerouted to this list. Yup. I am aware that in this matter, as well as in the infamous `Reply-To' debate, this list lies in the minority. But that is not a sufficient reason to back down and let the spammers win. Right now, [EMAIL PROTECTED] is providing free relaying for spammers to all its subscribers. sarcasmIf this is not letting the spammers win, I wonder what is./sarcasm If you have a spam-fighting suggestion that does *not* include disallowing non-subscriber postings, I am more than willing to listen. Mmm... What would you think of having the list software automatically add a special header (say X-Non-Subscriber) to every mail sent by a non-subscriber ? -- André Majorel URL:http://www.teaser.fr/~amajorel/ std::disclaimer (Not speaking for my employer);
Re: Noise ratio getting a bit high?
Andre Majorel [EMAIL PROTECTED] writes: I respectfully disagree. If we can spend the time to read and answer the poster's question, the poster can spend five minutes to subscribe/unsubscribe. For reference, see the netiquette item on posting to newsgroups and asking for replies by email. I am aware of newsgroup etiquette, but I consider a newsgroup to be different from a mailing list devoted to helping users. Besides, subscribing to and unsubscribing from an unknown mailing list are much more annoying processes than they are for newsgroups. I suppose we can only agree to disagree on this one. I am aware that in this matter, as well as in the infamous `Reply-To' debate, this list lies in the minority. But that is not a sufficient reason to back down and let the spammers win. Right now, [EMAIL PROTECTED] is providing free relaying for spammers to all its subscribers. So does any mailing list with open subscription. I find your choice of wording strange, sort of like saying that `sendmail' provides free transmission of spam. That may be so, but that was not its intention, and the fact that it's misused is no reason to cripple its intended use. If you have a spam-fighting suggestion that does *not* include disallowing non-subscriber postings, I am more than willing to listen. Mmm... What would you think of having the list software automatically add a special header (say X-Non-Subscriber) to every mail sent by a non-subscriber ? I see where you're getting at, and I would have absolutely no objections to that.
Re: Noise ratio getting a bit high?
Marc Stephenson [EMAIL PROTECTED] writes: If you have a spam-fighting suggestion that does *not* include disallowing non-subscriber postings, I am more than willing to listen. It's not spam fighting, but I would personally like to see a wget-announce moderator-only list where new releases and security announcements could be posted. That would prevent spam for me. That might make sense independent of the spam -- some people would choose that list simply to avoid the volume of this list. So far I haven't bothered to create an announcement list because there were no requests for one, and because I can't think of announcements one could make other than for releases, and you can use freshmeat et al. for that. There are likely people interested in wget who aren't that interested in grepping the 80 or so freshmeat announcements per day, so I think that it would be generally useful myself. -- Marc Stephenson IBM Server Group - Austin, TX Internet: [EMAIL PROTECTED] NOTES: [EMAIL PROTECTED] Phone: 512-327-5670 T/L 678-3189
Re: Noise ratio getting a bit high?
In message [EMAIL PROTECTED], Hrvoje Niksic said: Andre Majorel [EMAIL PROTECTED] writes: Right now, [EMAIL PROTECTED] is providing free relaying for spammers to all its subscribers. So does any mailing list with open subscription. Any spammer *could* subscribe to an open-subscription list, but as a practical matter they do not. Spam-generating software generally just takes two files: a list of addresses and the message to be sent. It then just blindly blasts out the message. Error responses are ignored, even if the headers are not forged to prevent responses from getting back at all. Spammers are not interested in bounce messages of any type, including You are not subscribed messages. It simply is not worth their time to figure out why some of their 500,000+ emails did not go through. Mmm... What would you think of having the list software automatically add a special header (say X-Non-Subscriber) to every mail sent by a non-subscriber ? I see where you're getting at, and I would have absolutely no objections to that. This would give us something on which we could filter. It also would prevent legitimate non-subscribers' messages being seen by some people. Possibly a good compromise. -- Jim McMaster mailto:[EMAIL PROTECTED]
Re: windows binary
Brent Morgan [EMAIL PROTECTED] writes: Whats CVS and what is the significance of this version? CVS stands for Concurrent Versions System, and is the version control system where the master sources for Wget are kept. I would not advise the download of the CVS version because it is likely to be incomplete or unstable. It would be nice if the 1.8.1+cvs binary could be moved to a less visible location, or on a separate page dedicated for development. Or accompanied by an explanation, etc.