using wget

2002-01-29 Thread Anselm Almeida

Hi,
 I'm new to wget.  I use the following command
to get files from the site
wget ftp://ftp.iitm.ac.in/debian

and have set my .wgetrc file with the following

tries=20
reclevel=5
passive_ftp = on
http_proxy = http://anselm:[EMAIL PROTECTED]:9000/
ftp_proxy= http://anselm:[EMAIL PROTECTED]:9000/ 
# above password changed !
use_proxy = on
dot_style = default
noclobber = on
glob=on
continue = on
robots = on
no_parent = on
wait = 60
dirstruct = on
add_hostdir = off
recursive = on
follow_ftp = on
simple_host_check=on
add_hostdir=off

  1.I have a have a very slow link to the Internet
through a proxy.  When the connection breaks, wget will
go on to get the next file instead of trying for the
same file (before the connection broke). eg. 

Connecting to
darya.nio.org:9000... Connection to darya.nio.org:9000
refused. --12:36:14-- 
ftp://ftp.iitm.ac.in:21/debian/pool/main/libr 
  = `debian/pool/main/libr' Connecting to
darya.nio.org:9000... Connection to darya.nio.org:9000
refused. --12:37:14-- 
ftp://ftp.iitm.ac.in:21/debian/pool/main/libs 
  = `debian/pool/main/libs' Connecting to
darya.nio.org:9000... Connection to darya.nio.org:9000
refused. --12:38:14-- 
ftp://ftp.iitm.ac.in:21/debian/pool/main/libt 
  = `debian/pool/main/libt' Connecting to
darya.nio.org:9000... Connection to darya.nio.org:9000
refused. --12:39:14-- 
ftp://ftp.iitm.ac.in:21/debian/pool/main/libu 
  = `debian/pool/main/libu'

 2. Also, if the connection is bad 0 bytes are
downloaded and wget goes on to download the next file
and in the directory structure I get a lot of files
with 0 bytes downloaded, whereas I would have expected
wget to keep on trying to get the file till it is
completely downloaded! please see enclosed file text3.

  3.The third thing is that if I restart
wget again to continue the download, after the
connection is broken, it justs skips the files which
have been partially downloaded and downloads only the
new files which have not been downloaded.

 I suspect, I am not using wget's options
properly and I will be grateful for any leads.

 Thanks in advance.  I'm using wget 1.5.3.

Anselm




an/dists/woody/main/source/net/dhcpcd_1.3.17pl2.orig.tar.gz'
Connecting to darya.nio.org:9000... connected!
Proxy request sent, awaiting response... 200 document follows
Length: unspecified [application/octet-stream]

0K -

10:07:10 (0.00 B/s) - 
`debian/dists/woody/main/source/net/dhcpcd_1.3.17pl2.orig.tar.gz' saved [0]

--10:08:10--  
ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz
   = 
`debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz'
Connecting to darya.nio.org:9000... connected!
Proxy request sent, awaiting response... 200 document follows
Length: unspecified [application/octet-stream]

0K -

10:14:00 (0.00 B/s) - 
`debian/dists/woody/main/source/net/epic4-script-splitfire_1.6.orig.tar.gz' saved [0]

--10:15:00--  
ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/epic_3.004.orig.tar.gz
   = `debian/dists/woody/main/source/net/epic_3.004.orig.tar.gz'
Connecting to darya.nio.org:9000... connected!
Proxy request sent, awaiting response... 500 Internal Error
10:15:20 ERROR 500: Internal Error.

--10:16:20--  
ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz
   = `debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz'
Connecting to darya.nio.org:9000... connected!
Proxy request sent, awaiting response... 200 document follows
Length: unspecified [application/octet-stream]

0K -

10:21:44 (0.00 B/s) - `debian/dists/woody/main/source/net/fakebo_0.4.1.orig.tar.gz' 
saved [0]

--10:22:44--  
ftp://ftp.iitm.ac.in:21/debian/dists/woody/main/source/net/fmirror_0.8.4beta.orig.tar.gz
   = `debian/dists/woody/main/source/net/fmirror_0.8.4beta.orig.tar.gz'
Connecting to darya.nio.org:9000... connected!
Proxy request sent, awaiting response... 200 document follows
Length: unspecified [application/octet-stream]

0K - .. .. .. 





Re: mirroring vs -m

2002-01-29 Thread Alan Eldridge

On Tue, Jan 29, 2002 at 04:54:17PM +0100, Andre Majorel wrote:
On 2002-01-29 09:56 -0500, Alan Eldridge wrote:

 In particular, does wget parse and follow links in an HTML document,
 when that document is retrieved (using -r) via the FTP protocol? If
 not, why not?

I'm inclined to think that recursive retrieval without parsing
is a feature. HTML content is normally served over HTTP. If you
want to retrieve HTML through FTP, it's likely because you do
*not* want to follow the links.

I agree with you. I'm not making a case for doing it, but merely
bringing up that there are two different recursive retrieval models,
one of which (FTP) is (IIRC) a true mirror. 

One of the properties of mirroring is that links are followed all
the way down the tree; that is, the tree is not pruned (depth-wise)
just because a node at a given level has not been changed.

It seems to me that there are some abstractions that can be made about
the retrieval process.

In particular, there's:

(1) the continuation protocol: does the retrieval continue
to futher depth once an unmodified node is encountered?

(2) the child-list acquisition protocol: do we get the list
of children of this node by (a) examining metadata or (b) parsing
the contents of the node as a document? 

IOW, recursive retrieval can be thought of as a generalized process,
parameterized by protocols that determine that actions and state
transitions at each node in the tree. Ideally, the code that implements
recursive retrieval would know nothing about either the communication
protocol (ftp vs. http) or the data/metadata formats (having to do a special
operation - an ftp listing - on a directory node for ftp, vs having to
do a special operation - an html parse - on a particular type of file
node for an http traversal).

-- 
Alan E
Please rush me my portable walrus cleaning kit! Yes I am over 18, 
but my IQ isn't.



Can wget handle this scenario?

2002-01-29 Thread Tomislav Goles


Hi,
I have been happily using wget to handle automatic
ftp download but now have a situation which I am
not sure whether wget can handle.
This is the type of synax that I have been using without
any problems:

$ wget ftp://username:[EMAIL PROTECTED]/file.txt

Now I need to add the twist where username account info
resides on another machine (i.e. machine2 which by the way
is on the same network as machine1) So I need to do something
like the following:

$ wget ftp://username:[EMAIL PROTECTED]@machine1.com/file.txt

which is of course not the syntax wget understands. But perhaps
there are some wget flags or other wget magic I am not aware of 
(while looking at the 'wget --help') output that allows for this?

I can do an interactive ftp session which allows me to solve
the situation in the following way:

$ ftp machine2.com
ftp username: username@machine1
ftp password: passwd
ftp

The above sequence gets me in without problem.
But I can't figure out whether there is some way to automate
this (without resorting to writing expect script).
I would really prefer to do this with either wget, ftp, pavuk,
or curl but I don't know whether any of those clients can do
this.
Any info on this would be most helpfull.
Please e-mail me with any bright ideas as I am not subscribed 
to the wget lists.
Thanks,
Tomislav Goles
[EMAIL PROTECTED]




Re: mirroring vs -m

2002-01-29 Thread csaba . raduly


On 29/01/2002 15:54:17 Andre Majorel wrote:

[snip debate about following links in HTML retrieved by FTP]

I'm inclined to think that recursive retrieval without parsing
is a feature. HTML content is normally served over HTTP. If you
want to retrieve HTML through FTP, it's likely because you do
*not* want to follow the links.


I (client) don't get the choice. If the document at
http://foo.bar/index.html has all its links like this:

A HREF=ftp://foo.bar/welcome.html;welcome/A

the client has no choice but to retrieve them via FTP.
It would be nice if wget was able to follow all those links.


If Wget always parsed HTML, even over FTP, it would be
impossible to make a complete mirror a tree that has broken href
links or hidden files.

Perhaps If wget started with FTP, it should mirror FTP-like
(.listing and all that). If it started via HTTP, it should follow links,
regardless of future retrieval modes

[snip]

--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




[±¤°í]¿ª¼úÀλó´ã °áÈ¥? ÁÖ½ÄÅõÀÚ? °Ç°­? Á÷Àå?

2002-01-29 Thread ö¾Ï¿î¼¼



Á¤º¸Åë½ÅºÎ ±Ç°í »çÇ׿¡ ÀÇ°Å Á¦¸ñ¿¡ [±¤°í]¶ó°í
Ç¥±âÇÑ ±¤°í ¸ÞÀÏÀÔ´Ï´Ù.

ÀÛ³â ÇÑÇØ ÀÏÀÌ Á¦´ë·Î ¾ÊÇ®¸®¼Ì´øºÐ, ´ä´äÇÑ »ç¿¬ÀÌ ÀÖÀ¸¼Ì´øºÐ!!

¿ª¼úÀΰú »ó´ãÇغ¸½Ê½Ã¿ä ºÐ¸í ´äÀÌ
ÀÖ½À´Ï´Ù

ÅäÁ¤ºñ°á, ¼Ó±ÃÇÕ °Ñ±ÃÇÕ, »ç¾÷¿î, ½ÂÁø¿î, ÁøÇпî, °Ç°­¿î,

ÀÛ¸í, ÅÃÀÏ, º°ÀÚ¸®Á¡, dz¼öÁö¸®, ¾ÖÁ¤¿î, »ç¶û¿î

½ÅÅëÇÏ°Ô ¸ÂÃß´Â °ú°Å¿Í ¹Ì·¡, "±ô¦ ³î¶ó½Ç°Ì´Ï´Ù"


¢Ï
060-708-7081

"¿ª¼úÀº Åë°è¿¡ ÀÇÇÑ °úÇÐÀûÀÎ
Çй®ÀÔ´Ï´Ù"


¡Ø ±ÍÇÏÀÇ ½Â¶ô ¾øÀÌ È«º¸¼º ¸ÞÀÏÀ» º¸³»°Ô µÈ Á¡ Á¤ÁßÈ÷ »ç°ú
µå¸³´Ï´Ù. 
- º» ¸ÞÀÏÀº ÀÎÅͳݻóÀÇ ¸ÞÀÏÁÖ¼Ò¸¦ ¹ßÃéÇÏ¿©
¹ß¼ÛÇÏ¿´½À´Ï´Ù.
- º» ¸ÞÀÏÀº Á¤º¸Åë½Å¸Á ÀÌ¿ëÃËÁø ¹× Á¤º¸º¸È£ µî¿¡ °üÇÑ ¹ý·ü Á¦
50Á¶¿¡ ÀÇ°ÅÇÑ [±¤°í] ¸ÞÀϷμ­ ¿øÄ¡ ¾ÊÀ¸ ½Ã¸é »èÁ¦Çϰųª °ÅºÎ¸ÞÀÏÀ» º¸³»Áֽñ⠹ٶø´Ï´Ù. 





 ¹öÆ°À» Ŭ¸¯ÇÏ½Ã¸é ¼ö½Å°ÅºÎ󸮰¡ ÀÌ·ç¾î Áý´Ï´Ù.



Re: Noise ratio getting a bit high?

2002-01-29 Thread Marc Stephenson

 
 If you have a spam-fighting suggestion that does *not* include
 disallowing non-subscriber postings, I am more than willing to listen.
 

It's not spam fighting, but I would personally like to see a wget-announce 
moderator-only list where new releases and security announcements could be 
posted.   That would prevent spam for me.


-- 
Marc Stephenson   IBM Server Group - Austin, TX
Internet:  [EMAIL PROTECTED]  NOTES: [EMAIL PROTECTED]
Phone:   512-327-5670  T/L 678-3189



Re: Noise ratio getting a bit high?

2002-01-29 Thread James C. McMaster (Jim)

In message [EMAIL PROTECTED], Hrvoje Niksic said:
 James C. McMaster (Jim) [EMAIL PROTECTED] writes:
 
  In message [EMAIL PROTECTED], Thomas Reinke said:
  Is anyone else not finding the noise ratio (i.e. spam) a bit high
  here? I sympathize with the effort required to lightly moderate,
  but might I recommend that _something_ be done to rid us all of
  this spam? It's getting to be irritating enough that I'm tempted to
  drop off the list, which I'd just as soon not do - wget is a
  fantastic little tool that I'd just as soon stay involved with
  actively, if possible.
  
  The easiest solution would be for the list owners to require people
  to subscribe before posting.  So far, they seem unwilling to do
  that.  All the product-support lists to which I subscribe (except
  this one) have that policy, and I never get spam from any of them.
 
 I do not know what you call a product support mailing list, but this
 is a free software project development list, and certainly not the
 only one with the open posting policy.  For example, XEmacs mailing
 lists are open to non-subscriber posting.
 
Product is a generic term.  I subscribe to mailing lists on apache, tomcat, 
exmh, nmh and procmail. All these packages are open-source products.  All 
of these lists require subscription before posting.  I receive spam from none 
of them.

 But that was just an example.  The actual reasoning for allowing
 non-subscriber posting boils down to three reasons:
 
 1. I believe it is the right thing to do.  I personally hate allegedly
supportive mailing lists that require me to subscribe before
asking a question.  I don't want to subscribe, dammit, I just want
to ask something.

Your call.  Subscription and unsubscription are easy enough to do in my 
opinion.  I personally think people who ask a question and then add, Please 
reply privately since I am not on the list are leeches if they want to use 
the list without giving anything back.  If anyone just hits reply, the 
person will never see the answer.

 2. It allows the discussion to extend to non-subscribers.  You can
simply Cc a person to a discussion pertinent to him, and he will be
able to respond to the list.
 
Again, if they are interested enough to contribute to the discussion they 
should be willing to subscribe.

 3. It allows the mails from [EMAIL PROTECTED] to be rerouted to this
list.
 
Fine.  Why bother with the bug list then?  Also, the same problem applies as 
with this list.  If a person just replies, the reporter will never see the 
response.

 I am aware that in this matter, as well as in the infamous `Reply-To'
 debate, this list lies in the minority.  But that is not a sufficient
 reason to back down and let the spammers win.
 
I disagree with you on the Reply-to matter as well, but that is not the 
argument.  The point is not that your list in in the minority, it is *why* 
you are in the minority.  The quantity of spam on this list has been annoying 
for awhile now.  It is getting really tiresome now.  Once the spammers get 
your address they sell it to other spammers, so the quantity will only 
increase from now on.  Don't the spammers also win if they annoy enough of 
the knowledgeable people on this list that they leave?  Even if they don't 
the people relying on that expertise surely lose.

 If you have a spam-fighting suggestion that does *not* include
 disallowing non-subscriber postings, I am more than willing to listen.
 
The only alternative I can imagine is moderation.  I doubt you or anyone else 
has the time or inclination.

Personally I have rearranged my .procmailrc so this list gets processed after 
my spam filters.  That leads to the risk I will miss some valid postings, but 
so be it.  If that does not catch the spam, I will unsubscribe from the list. 
altogether.
-- 
Jim McMaster
mailto:[EMAIL PROTECTED]





Re: Noise ratio getting a bit high?

2002-01-29 Thread Andre Majorel

On 2002-01-29 22:02 +0100, Hrvoje Niksic wrote:

 But that was just an example.  The actual reasoning for allowing
 non-subscriber posting boils down to three reasons:
 
 1. I believe it is the right thing to do.  I personally hate allegedly
supportive mailing lists that require me to subscribe before
asking a question.  I don't want to subscribe, dammit, I just want
to ask something.

I respectfully disagree. If we can spend the time to read and
answer the poster's question, the poster can spend five minutes
to subscribe/unsubscribe.

For reference, see the netiquette item on posting to newsgroups
and asking for replies by email.

 2. It allows the discussion to extend to non-subscribers.  You can
simply Cc a person to a discussion pertinent to him, and he will be
able to respond to the list.
 
 3. It allows the mails from [EMAIL PROTECTED] to be rerouted to this
list.

Yup.

 I am aware that in this matter, as well as in the infamous `Reply-To'
 debate, this list lies in the minority.  But that is not a sufficient
 reason to back down and let the spammers win.

Right now, [EMAIL PROTECTED] is providing free relaying for
spammers to all its subscribers. sarcasmIf this is not
letting the spammers win, I wonder what is./sarcasm

 If you have a spam-fighting suggestion that does *not* include
 disallowing non-subscriber postings, I am more than willing to listen.

Mmm... What would you think of having the list software
automatically add a special header (say X-Non-Subscriber) to
every mail sent by a non-subscriber ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Noise ratio getting a bit high?

2002-01-29 Thread Hrvoje Niksic

Andre Majorel [EMAIL PROTECTED] writes:

 I respectfully disagree. If we can spend the time to read and
 answer the poster's question, the poster can spend five minutes
 to subscribe/unsubscribe.
 
 For reference, see the netiquette item on posting to newsgroups
 and asking for replies by email.

I am aware of newsgroup etiquette, but I consider a newsgroup to be
different from a mailing list devoted to helping users.  Besides,
subscribing to and unsubscribing from an unknown mailing list are much
more annoying processes than they are for newsgroups.

I suppose we can only agree to disagree on this one.

 I am aware that in this matter, as well as in the infamous
 `Reply-To' debate, this list lies in the minority.  But that is not
 a sufficient reason to back down and let the spammers win.
 
 Right now, [EMAIL PROTECTED] is providing free relaying for spammers
 to all its subscribers.

So does any mailing list with open subscription.  I find your choice
of wording strange, sort of like saying that `sendmail' provides free
transmission of spam.  That may be so, but that was not its intention,
and the fact that it's misused is no reason to cripple its intended
use.

 If you have a spam-fighting suggestion that does *not* include
 disallowing non-subscriber postings, I am more than willing to listen.
 
 Mmm... What would you think of having the list software
 automatically add a special header (say X-Non-Subscriber) to every
 mail sent by a non-subscriber ?

I see where you're getting at, and I would have absolutely no
objections to that.



Re: Noise ratio getting a bit high?

2002-01-29 Thread Marc Stephenson

 
 Marc Stephenson [EMAIL PROTECTED] writes:
 
  
  If you have a spam-fighting suggestion that does *not* include
  disallowing non-subscriber postings, I am more than willing to listen.
  
  
  It's not spam fighting, but I would personally like to see a
  wget-announce moderator-only list where new releases and security
  announcements could be posted.  That would prevent spam for me.
 
 That might make sense independent of the spam -- some people would
 choose that list simply to avoid the volume of this list.
 
 So far I haven't bothered to create an announcement list because there
 were no requests for one, and because I can't think of announcements
 one could make other than for releases, and you can use freshmeat et
 al. for that.
 

There are likely people interested in wget who aren't that interested in
grepping the 80 or so freshmeat announcements per day, so I think that
it would be generally useful myself.


-- 
Marc Stephenson   IBM Server Group - Austin, TX
Internet:  [EMAIL PROTECTED]  NOTES: [EMAIL PROTECTED]
Phone:   512-327-5670  T/L 678-3189



Re: Noise ratio getting a bit high?

2002-01-29 Thread James C. McMaster (Jim)

In message [EMAIL PROTECTED], Hrvoje Niksic said:
 Andre Majorel [EMAIL PROTECTED] writes:
 
  Right now, [EMAIL PROTECTED] is providing free relaying for spammers
  to all its subscribers.
 
 So does any mailing list with open subscription.  

Any spammer *could* subscribe to an open-subscription list, but as a 
practical matter they do not.  Spam-generating software generally just takes 
two files:  a list of addresses and the message to be sent.  It then just 
blindly blasts out the message.

Error responses are ignored, even if the headers are not forged to prevent 
responses from getting back at all.  Spammers are not interested in bounce 
messages of any type, including You are not subscribed messages.  It simply 
is not worth their time to figure out why some of their 500,000+ emails did 
not go through.

  Mmm... What would you think of having the list software
  automatically add a special header (say X-Non-Subscriber) to every
  mail sent by a non-subscriber ?
 
 I see where you're getting at, and I would have absolutely no
 objections to that.
 
This would give us something on which we could filter.  It also would prevent 
legitimate non-subscribers' messages being seen by some people.  Possibly a 
good compromise.
-- 
Jim McMaster
mailto:[EMAIL PROTECTED]





Re: windows binary

2002-01-29 Thread Hrvoje Niksic

Brent Morgan [EMAIL PROTECTED] writes:

 Whats CVS and what is the significance of this version?

CVS stands for Concurrent Versions System, and is the version
control system where the master sources for Wget are kept.  I would
not advise the download of the CVS version because it is likely to
be incomplete or unstable.

It would be nice if the 1.8.1+cvs binary could be moved to a less
visible location, or on a separate page dedicated for development.  Or
accompanied by an explanation, etc.