why must -B need -F to take effect?

2005-06-26 Thread Dan Jacobson
Why must -B need -F to take effect? Why can't one do
xargs wget -B http://bla.com/ -i - <

Re: why must -B need -F to take effect?

2005-06-28 Thread Dan Jacobson
Ok, then here

   -B URL
   --base=URL
   When used in conjunction with -F, prepends URL to relative links in
   the file specified by -i.

don't mention -F!


add --print-uris or --dry-run

2005-06-28 Thread Dan Jacobson
Wget needs a --print-uris or --dry-run option, to show what it would
get/do without actually doing it!

Not only can one check if e.g., -B will do what they want before
actually doing it, one could also use wget as a general URL extractor,
etc.

--debug is not what I'm talking about. I'm more talking about someting
like apt-get's --print-uris.


say where -l levels start

2005-07-24 Thread Dan Jacobson
In the man page
   -l depth
   --level=depth
   Specify recursion maximum depth level depth.  The default maximum
   depth is 5.

Say what levels 0 and 1 do, so one gets an idea about what depth means
'this page only' and 'just the links in this page, and no further'.


-Y not mentioned fully in man and info

2005-08-20 Thread Dan Jacobson
-Y not mentioned fully in man and info:
$ wget --help|grep -- -Y
  -Y,  --proxy   explicitly turn on proxy.
$ man wget|col -b|grep -- -Y
   if Wget crashes while downloading wget -rl0 -kKE -t5 -Y0
$ wget -V
GNU Wget 1.10.1-beta1 ...
Originally written by Hrvoje Niksic <[EMAIL PROTECTED]>
P.S., also say bug address here too else he will get bugs.


curl has --max-filesize

2005-10-21 Thread Dan Jacobson
Curl has this impressive looking feature:
$ man curl
 --max-filesize 
  Specify the maximum size (in bytes) of a file to download. If the
  file requested is larger than this value, the transfer will not
  start and curl will return with exit code 63.

  NOTE: The file size is not always known prior to download, and for
  such files this option has no effect even if the file transfer ends
  up being larger than this given limit. This concerns both FTP and
  HTTP transfers.*

Anyways, wget could also have --max-filesize. WWWOFFLE could do
something on a per URL basis. We modem users then wouldn't have to
worry about something going hog wild as much.

(*Well, they ought to have another option about what to do if it is not
known: get up to max=0, XXX bytes, or infinity.

Wait... couldn't some track of how many bytes swallowed so far be made
and a stop be put to it if it exceeds... Indeed, wget prints those
progress messages showing it is keeping track, info in header or not.

Indeed, at least we can still see the top part of some whopping .JPG,
etc. Too bad .pdfs are seemingly useless if truncated.)


I have a whole file of --headers

2005-10-21 Thread Dan Jacobson
What if I have a whole file of headers I want to use:
$ sed 1d /var/cache/wwwoffle/outgoing/O6PxpG00D+DBLAI8puEtOew|col|colrm 22
Host: www.hsr.gov.tw
User-Agent: Mozilla/5
Accept: text/xml,appl
Accept-Language: en-u
Accept-Encoding: gzip
Accept-Charset: Big5,
Keep-Alive: 300
Proxy-Connection: kee
Referer: http://www.h

Why isn't there an option where I can give the whole file to wget?
Why must one do painstaking scripts like

perl -anwe 'BEGIN{$a=0}s/\r//;chomp;if(/^GET/){print 
"\nwget \"$F[1]\" "};
if(/^(Refer|Cook|etc.etc.)/){print "--header=\"$_\" "}' O*

just to feed them one by one to wget?!


mention -r on the Recursive Download Info page

2005-11-08 Thread Dan Jacobson
In Info "3 Recursive Download", mention "-r, --recursive"!
Also don't threaten to remove -L!


Wishlist: support the file:/// protocol

2005-12-11 Thread Dan Jacobson
Wishlist: support the file:/// protocol:
$ wget file:///home/jidanni/2005_safe_communities.html


hard to tell -Y off means --no-proxy

2006-01-20 Thread Dan Jacobson
Nowadays in the documents, it is very hard to tell -Y off means
--no-proxy. You must be phasing out -Y or something. No big deal. OK.

Also reported already I think:
   For more information about the use of proxies with Wget,
And then nothing, on the man page.  GNU Wget 1.10.2


--random-wait: users can no longer specify a minimum wait

2006-02-02 Thread Dan Jacobson
"--random-wait causes the time between requests to vary between 0 and
2 * wait seconds, where wait was specified using the --wait option, "

So one can no longer specify a minimum wait time! The 2 and at least
the 0 should be user configurable floating numbers.


Re: --random-wait: users can no longer specify a minimum wait

2006-02-04 Thread Dan Jacobson
H> Maybe it should rather vary between 0.5*wait and 1.5*wait?
There you go again making assumptions about what the user wants.
H> I think it'd be a shame to spend more arguments on such a rarely-used
H> feature.
--random-wait[=a,b,c,d...] loaded with lots of downwardly compatible
arguments that are [?] ignored if this is an older wget.


make clear that no --force-html means .txt

2006-02-12 Thread Dan Jacobson
Man page says:
   -i file
   --input-file=file
   Read URLs from file.  If - is specified as file, URLs are read from
   the standard input.  (Use ./- to read from a file literally named
   -.)

   If this function is used, no URLs need be present on the command
   line.  If there are URLs both on the command line and in an input
   file, those on the command lines will be the first ones to be
   retrieved.  The file need not be an HTML document (but no harm if
   it is)---it is enough if the URLs are just listed
   sequentially.

Say if you don't use --force-html then it better not be HTML!

   However, if you specify --force-html, the document will be regarded
   as html.  In that case you may have problems with relative links,
   which you can solve either by adding "" to the
   documents or by specifying --base=url on the command line.

Also one can't use -i file:///bla.html


HTTP 1.1?

2006-02-21 Thread Dan Jacobson
The documents don't say how or why one can/not force wget to send
HTTP 1.1 headers, instead of 1.0. Maybe it is simply not ready yet?


can't recurse if no index.html

2006-03-22 Thread Dan Jacobson
I notice with server created directory listings, one can't recurse.
$ lynx -dump http://localhost/~jidanni/test|head
Index of /~jidanni/test
 Icon   [1]Name  [2]Last modified  [3]Size  [4]Description
  ___
 [DIR]  [5]Parent Directory -
 [TXT]  [8]cd.html 23-Feb-2006 20:55  931
$ wget --spider -S -r http://localhost/~jidanni/test/
localhost/~jidanni/test/index.html: No such file or directory


--force-html -i file.html

2006-07-04 Thread Dan Jacobson
$ man wget
   -i file
   --input-file=file

   The file need not be an HTML document (but no harm if it
   is)---it is enough if the URLs are just listed sequentially.

Well even with -i file.html, one still needs --force-html. So "yes
harm if it is".  GNU Wget 1.10.2


give error count at end

2006-08-02 Thread Dan Jacobson
  Downloaded: 735,142 bytes in 24 files
looks great. But if
  09:49:46 ERROR 404: WWWOFFLE Host Not Got.
flew off the screen, one will never know.
That's why you should say
  Downloaded: 735,142 bytes in 24 files. 3 files not downloaded.


wget --save-even-if-error

2006-11-12 Thread Dan Jacobson
One discovers that wget secretly (not documented) throws away the
content of the response if there was an error (404, 503, etc.).
So there needs to be a --save-even-if-error switch.


-Y is gone from the man page

2006-11-16 Thread Dan Jacobson
-Y is gone from the man page, except one tiny mention, and --help
doesn't show its args. Didn't check Info.


submitting wget bugs is like a blackhole

2007-01-03 Thread Dan Jacobson
>From the users perspective, sending bugs to [EMAIL PROTECTED] is like a
black hole. This is in contrast to other systems like the Debian bug
tracking system. No, don't move to bugzilla, else we won't be able to
send email.


no status option

2001-02-23 Thread Dan Jacobson

I thinking wget could have a status option, like bash's
$ set -o
allexport   off
braceexpand on
errexit off...

perhaps a plain
$ wget -d
might be a good place.
-- 
http://www.geocities.com/jidanni Tel886-4-25854780



return codes

2001-03-07 Thread Dan Jacobson

No documentation found on what wget's return codes are. e.g.
reasnonable wish:
$ wget -N URL && echo got it|mail john
please add to docs about what the policy is, even if 'none at present'
-- 
http://www.geocities.com/jidanni Tel886-4-25854780 ¿n¤¦¥§



wget --continue vs. wwwoffle

2003-06-24 Thread Dan Jacobson
The following message is a courtesy copy of an article
that has been posted to gmane.network.wwwoffle.user as well.

As we wwwoffle users all might know, wget has
  -c,  --continue   resume getting a partially-downloaded file.
Quite handy when a large download got interrupted.  However there are
some caveats when using it thru wwwoffle.  It would be neat if the
right combination of wget switches could be known for doing this
without causing wwwoffle to get the whole file again, and without
resorting to -Y off, bypassing wwwoffle.

Another case is where we get the partially downloaded (larva, pupa,
whatever) file out of the wwwoffle cache by hand and rename it to wget
expects, and continue with wget -Y off -c.  Wget doesn't know that
there is a HTTP header swelling the file, so that must first be
chopped off.  There are other considerations too probably.


can't turn off good messages without taking the bad too

2003-07-04 Thread Dan Jacobson
I was hoping to separate the usual news,
$ wget  http://abc.iis.sinica.edu.tw/
--09:26:00--  http://abc.iis.sinica.edu.tw/
   => `index.html'
Resolving localhost... done.
Connecting to localhost[127.0.0.1]:8080... connected.

from the bad news,

Proxy request sent, awaiting response... 503 Connect failed
09:26:00 ERROR 503: Connect failed.

but I see they both go to stderr.

Also, none of the command line switches affect one without affecting
the other.

Yes, stdout is reserved for -O -, but there ought to be a switch that
will cause no output on stdout unless it is real error output... even -nv
doesn't do that.

I must now write
t=/tmp/site-checker
if wget -Y off -t 1 --spider http://jidanni.org/index.html > $t 2>&1
then :
else cat $t >> $HOME/errors
fi
rm $t

when instead, wget --real-errors-only-please -q ... >> $HOME/errors

Would have done.  Then a test -s $HOME/errors is all that would be
needed to know there had been trouble.


want date too

2003-07-07 Thread Dan Jacobson
"--15:33:01--" is not adequate for beyond 24 hours.  Wish there was a
way to put more date info into this message, like syslog does, without
stepping outside wget.


-O --spider

2003-07-24 Thread Dan Jacobson
> You can view the map at:
> http://home.sara.nl/~bram/debchart.jpeg

< WARNING: this image is ENORMOUS.

OK, so I will use
wget -O --spider -Y off http://home.sara.nl/~bram/debchart.jpeg
to see how big before biting with my modem, I thought.  But I mistyped
-O for -S and ended up getting the whole file anyway.  So next time I
wish wget would see this as missing arguments. We can write ./--spider
if that is where we really want to put the output.


dug long to find how to not look in .netrc

2003-08-27 Thread Dan Jacobson
The man page says

   To prevent the pass- words from being seen, store them in
   .wgetrc or .netrc, 

The problem is that if you just happen to have a .netrc entry for a
certain machine, but you don't wish wget would notice it, then what to
do?

Can you believe
$ wget --http-user= --http-passwd= http://debian.linux.org.tw/
$ wget --http-user=x --http-passwd=x http://debian.linux.org.tw/
don't even override .netrc?!

$ wget http://:@debian.linux.org.tw/
http://:@debian.linux.org.tw/: Invalid user name.
$ wget http://x:[EMAIL PROTECTED]/
OK, that overrides it, but still, one can't achieve  no username and password.

OK, faraway in an Info page do I finally find a .wgetrc's netrc=off
... this should be noted everywhere the docs mention .netrc.
Also there should be a way to do it from the command line -- I making
a script that shouldn't blow up just because the user has an account
on some mirror.

BTW, the man page also says

   For more information about security issues with Wget,

but then the sentence just stops.

GNU Wget 1.8.2


-q and -S are incompatible

2003-10-06 Thread Dan Jacobson
-q and -S are incompatible and should perhaps produce errors and be noted thus
 in the docs.

BTW, there seems no way to get the -S output, but no progress
indicator.  -nv, -q kill them both.

P.S. one shouldn't have to confirm each bug submission. Once should be enough.


-T default really 15 minutes?

2003-10-31 Thread Dan Jacobson
Man says:
   -T seconds ... The default timeout is 900 seconds

Ok, then why does this take only 3 minutes to give up?:

--07:58:54--  
http://linux.csie.nctu.edu.tw/OS/Linux/distributions/debian/dists/sid/main/binary-i386/Packages.gz
   => `Packages.gz'
Resolving linux.csie.nctu.edu.tw... done.
Connecting to linux.csie.nctu.edu.tw[140.113.17.250]:80... failed: Connection 
timed out.
Giving up.

--08:02:07--  
http://debian.csie.ntu.edu.tw/debian/dists/sid/main/binary-i386/Packages.gz



I used wget -t 1 -Y off -S --spider url1 url2 ...

Therefore the man page does not mention other factors involved.

I want to limit that 3 minute above timeout to only 30 seconds, but it
appears -T is not the one affecting this case, or else it should have
waited 15 minutes as documented.

Nothing here messing things up:
$ grep ^[^#]  $HOME/.wgetrc /etc/wgetrc
/home/jidanni/.wgetrc:netrc=off
/etc/wgetrc:passive_ftp = on
/etc/wgetrc:waitretry = 10

$ wget --version
GNU Wget 1.8.2 ...
Originally written by Hrvoje Niksic <[EMAIL PROTECTED]>.

I'd put the bug address there too or instead.


feature request: --second-guess-the-dns

2003-11-15 Thread Dan Jacobson
I see there is
   --bind-address=ADDRESS
   When making client TCP/IP connections, "bind()" to ADDRESS on the local 
machine.
   ADDRESS may be specified as a hostname or IP address.  This option can be 
useful
   if your machine is bound to multiple IPs.

But I want a
   --second-guess-the-dns=ADDRESS
so I can
$ wget http://jidanni.org/
Resolving jidanni.org... done.
Connecting to jidanni.org[216.46.203.182]:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
$ wget --second-guess-the-dns=216.46.192.85 http://jidanni.org/
Connecting to jidanni.org[216.46.192.85]:80... connected...

Even allow different port numbers there, even though we can add them
after the url already:

$ wget --second-guess-the-dns=216.46.192.85:66 http://jidanni.org:888/
or whatever. Also pick a better name than --second-guess-the-dns --
which is just a first guess for a name.

Perhaps the user should do all this in the name server or something,
but lets say he isn't root, and doesn't want to use netcat etc. either.


if anything bad happens, return non-zero

2003-11-17 Thread Dan Jacobson
$ wget --spider BAD_URL GOOD_URL; echo $?
0
$ wget --spider GOOD_URL BAD_URL; echo $?
1
I say they both should be 1.
If anything bad happens, return 1 or some other non-zero value.
By BAD, I mean a producer of e.g.,
ERROR 503: Service Unavailable.

--spider or not, too.

And stop making me have to confirm each and every mail to this list.


Re: feature request: --second-guess-the-dns

2003-11-17 Thread Dan Jacobson
By the way, I did edit /etc/hosts to do one experiment
http://groups.google.com/groups?threadm=vrf7007pbg2136%40corp.supernews.com
i.e. <[EMAIL PROTECTED]>
to test an IP/name combination, without waiting for DNS's to update.
Good thing I was root so I could do it.

I sure hope that when one sees
  Connecting to jidanni.org[216.46.192.85]:80... connected.
that there is no interference along the way, that that IP is really
where we are going, to wget's best ability.

By the way, /etc/hosts affects other users on the system, and other
jobs than the current one; and one might be using various caching
DNSs, etc. Just one more justification for this wishlist
item. --connect-address sounds ok... whatever.


Re: feature request: --second-guess-the-dns

2003-11-17 Thread Dan Jacobson
> "P" == Post, Mark K <[EMAIL PROTECTED]> writes:

P> You can do this now:
P> wget http://216.46.192.85/

P> Using DNS is just a convenience after all, not a requirement.

but then one doesn't get the HTTP Host field set to what he wants.


Re: non-subscribers have to confirm each message to bug-wget

2003-11-17 Thread Dan Jacobson
>> And stop making me have to confirm each and every mail to this list.

Hrvoje> Currently the only way to avoid confirmations is to subscribe to the
Hrvoje> list.  I'll try to contact the list owners to see if the mechanism can
Hrvoje> be improved.

subscribe me with the "nomail" option, if it can't be fixed.

often I come back from a long vacation, only to find my last reply is
waiting for confirmation, that probably expired.


Re: feature request: --second-guess-the-dns

2003-11-17 Thread Dan Jacobson
H> It's not very hard to fix `--header' to replace Wget-generated
H> values.

H> Is there consensus that this is a good replacement for
H> `--connect-address'?

I don't want to tamper with headers.
I want to be able to do experiments leaving all variables alone except
for IP address.  Thus --connect-address is still needed.


--spider gets file if ftp !

2003-12-08 Thread Dan Jacobson
   --spider
   ...it will not download the pages...
$ wget -Y off --spider  ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.0.91.tar.bz2
--12:13:37--  ftp://alpha.gnu.org/gnu/coreutils/coreutils-5.0.91.tar.bz2
   => `coreutils-5.0.91.tar.bz2'
Resolving alpha.gnu.org... done.
Connecting to alpha.gnu.org[199.232.41.11]:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD /gnu/coreutils ... done.
==> PASV ... done.==> RETR coreutils-5.0.91.tar.bz2 ... done.
Length: 4,183,673 (unauthoritative)

 0% [ ] 40,544 3.38K/sETA 19:56...

excuse me, you said will not download the file.
I just wanted to know how big it was, not get it.
GNU Wget 1.8.2


Re: wget -s -O pp --spider

2004-01-27 Thread Dan Jacobson
> "Hrvoje" == Hrvoje Niksic <[EMAIL PROTECTED]> writes:

Hrvoje> Please send bug reports to [EMAIL PROTECTED], or at least make sure
Hrvoje> that they don't go only to me.

Yes, but needing a confirmation message over and over has driven me nuts.


apt-get via Windows with wget

2004-01-29 Thread Dan Jacobson
I suppose Windows users don't have a way to get more that one file at
once, hence to have a Windows user download 500 files and burn them
onto a CD, as in
http://jidanni.org/comp/apt-offline/index_en.html
so one needs wget?  Any tips on the concept in my web page? I don't have
Windows to try it.  Certainly something will go wrong?  Also note
http://groups.google.com/groups?threadm=1i3A5-1YO-13%40gated-at.bofh.it


check just the size on ftp

2004-01-29 Thread Dan Jacobson
Normally, if I want to check out how big a page is before committing
to download it, I use
wget -S --spider URL
You might give this as a tip in the docs.

However, for FTP it doesn't get one the file size.  At least for
wget -S --spider ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complete.zip

Of course one need only get the directory to see the size.

P.S.
H> Yes, I have now changed the behavior of qconfirm for the wget lists to
H> only ask for confirmation once pr envelope-sender.

H> SunSITE.dk Staff
Great!


Re: apt-get via Windows with wget

2004-01-30 Thread Dan Jacobson
H> For getting Wget you might want to link directly to
H> ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complete.zip,
OK, but too bad there's no stable second link .../latest.zip so I
don't have to update my web page to follow the link.

Furthermore, they don't need SSL, but I don't see any 'diet'
versions...

H> Oh, and the Windows users should preferrably be ones who know how to
H> run a command-line application, but I assume you've got that covered.

Exactly not.  I recall being able to get to a little window where one
enters a command... Anyway, can you give an example of all the steps
needed to do wget -x -i fetch_list.txt -B http://debian.linux.org.tw/debian/pool/main/

You probably could add this example to the web page too, (without the
[] lines.):

[Click on fetch_list.txt; save it to a file.]
Click on ..wget...zip URL
UNzip it [yes, can get this far, I remember]
then what
then what
wget [options]
[nero [OK, they can handle that.]


Re: apt-get via Windows with wget

2004-02-18 Thread Dan Jacobson
It seems one cannot just use the wget .exe without the DLLs, even if
one only wants to connect to just http sites, not any https sites.

So one cannot just click on the wget .exe from inside Unzip's filelist.


wget has no tool to just show the size of a FTP url without fetching it?

2004-04-26 Thread Dan Jacobson
True, the man page doesn't say --spider will tell me the size of a
file without fetching it, but I already got used to that on http, but
for ftp,
wget --spider -Y off -S ftp://gmt.soest.hawaii.edu/pub/gmt/4/GMT_high.tar.bz2
just give some messages, ending in
227 Entering Passive Mode (128,171,159,169,154,34)
not really much that assures the user that the file is there, and
still no idea of how big it is, without trying to fetch it.

> then just FTP the directory, the size usually can be seen there.

OK, so wget has no tool to just show the size of a FTP url without fetching it?


-O vs. -nc

2004-04-27 Thread Dan Jacobson
On the man page the interaction between -O vs. -nc is not mentioned!
Nor perhaps -O vs. -N.

Indeed, why don't you cause an error when you find both -O and -nc
used, if you don't intend to allow -nc work with -O, which would
actually be best.


save more cookies between invocations

2004-05-28 Thread Dan Jacobson
Wishlist: giving a way to save the types of cookies you say you won't in:
`--save-cookies FILE'
 Save cookies to FILE at the end of session.  Cookies whose expiry
 time is not specified, or those that have already expired, are not
 saved.
so we can carry state between wget invocations, without having to dig
them out of -S output.  There should be several levels of saving
allowed, including overriding expiry dates, etc.


Re: save more cookies between invocations

2004-05-29 Thread Dan Jacobson
H> Do you really need an option to also save expired cookies?

You should allow the user power over all aspects...


Say "older or the same age"

2004-06-16 Thread Dan Jacobson
$ info
 The time-stamping in GNU Wget is turned on using `--timestamping'
  (`-N') option, or through `timestamping = on' directive in `.wgetrc'.
  With this option, for each file it intends to download, Wget will check
  whether a local file of the same name exists.  If it does, and the
  remote file is older, Wget will not download it.

Say "older or the same age", not just "older".

On another info page:

 The `Last-Modified' header is examined to find which file was
  modified more recently (which makes it "newer").  If the remote file is
  newer, it will be downloaded; if it is older, Wget will give up.(1)

Mention what if they are the same.

 (1) As an additional check, Wget will look at the `Content-Length'
  header, and compare the sizes; if they are not the same, the remote
  file will be downloaded no matter what the time-stamp says.

Mention what happens if we get Length: unspecified.
Apparently that will not trigger a download. (Good.)


say what circumstances wget will return non-zero

2004-06-17 Thread Dan Jacobson
The docs should mention return value... In fact it should be an item
in the Info Concept Index.

I.e. how to depend on
$ wget ... && bla || mla
So say what circumstances wget will return non-zero.


mention -e in the same paragraph

2004-06-20 Thread Dan Jacobson
In Info where you mention:
   Most of these commands have command-line equivalents (*note
   Invoking::), though some of the more obscure or rarely used ones do not.
You should also mention -e in the same paragraph.


--random-wait but no --wait

2004-06-21 Thread Dan Jacobson
The man page doesn't say what will happen if one specifies
--random-wait but no --wait has been used.
Perhaps just say under --wait that --wait=0 if not set.


--print-uris

2004-06-21 Thread Dan Jacobson
Wget should have a --print-uris option, to tell us what it is planning
to get, so we can adjust things without making commitment yet.
Perhaps useful with -i or -r...


parallel fetching

2004-07-13 Thread Dan Jacobson
Maybe add an option so e.g.,
$ wget --parallel URI1 URI2 ...
would get them at the same time instead of in turn.


only depend on the timestamp, not size

2004-07-18 Thread Dan Jacobson
Man page:  When running Wget with -N, with or without -r, the decision as to whether 
or not
   to download a newer copy of a file depends on the local and remote 
timestamp and
   size of the file.

I have an application where I want it only to depend on the timestamp.
Too bad there's no way to decouple the two conditions.


Re: parallel fetching

2004-07-18 Thread Dan Jacobson
Phil> How about
Phil> $ wget URI1 & wget URI2

Mmm, OK, but unwieldy if many. I guess I'm thinking about e.g.,
$ wget --max-parallel-fetches=11 -i url-list
(hmm, with default=1 meaning not parallel, but sequential.)


Re: parallel fetching

2004-07-22 Thread Dan Jacobson
H> I suppose forking would not be too hard, but dealing with output from
H> forked processes might be tricky.  Also, people would expect `-r' to
H> "parallelize" as well, which would be harder yet.

OK, maybe add a section to the manual, showing that you have
considered parallel fetching, but the complications outweigh the gains.


--post-data --spider

2004-07-28 Thread Dan Jacobson
$ man wget
   This example shows how to log to a server using POST and then proceed to 
download
   the desired pages, presumably only accessible to authorized users:

   # Log in to the server.  This can be done only once.
You mean "we only do this once".

   wget --save-cookies cookies.txt \
--post-data 'user=foo&password=bar' \
http://server.com/auth.php

Say, sometimes I bet --spider could be added to make it even more efficient.
Mention that.

(WWWOFFLE note: WWWOFFLE turns HEADs into GETs, and strips any
--post-data content.  Maybe WWWOFFLE should tell the user in such
cases, or something.)


-x vs. file.1

2004-07-28 Thread Dan Jacobson
$ man wget
   When running Wget without -N, -nc, or -r, downloading the same file in the 
same
   directory will result in the original copy of file being preserved and the 
second
   copy being named file.1.

$ wget -x http://static.howstuffworks.com/flash/toilet.swf
$ wget -x http://static.howstuffworks.com/flash/toilet.swf
Clobbered the first.  So better fix the docs. Also there is no way to
get file.1 with -x. I suggest you make a way.


Re: --post-data --spider

2004-07-29 Thread Dan Jacobson
BTW, because wget 1.9.1 has no way to save "session cookies" yet, that
example will fail often.  Hopefully the user will soon be able to
control what cookies are saved no matter what they themselves say.


tmp names

2004-08-01 Thread Dan Jacobson
Perhaps a useful option would be to have files use a temporary name
until download is complete, then moving to the permanent name.


mention that -p turns on -x

2004-08-21 Thread Dan Jacobson
Mention that -p turns on or implies -x in both the -p and -x parts of
both the man and info pages.


document -N -p

2004-09-06 Thread Dan Jacobson
To Info node "Time-Stamping Usage" add a clarification about what
happens when -N and -p are used together: are e.g., all the included
images also checked, or just the main page?


incomplete sentence on man page

2004-09-25 Thread Dan Jacobson
On the man page:
   For more information about the use of proxies with Wget,

   -Q quota


No URLs found in -

2004-11-04 Thread Dan Jacobson
Odd,
$ ssh debian.linux.org.tw wget -e robots=off --spider -t 1 -i - < a.2
No URLs found in -.
Or is this wget just too old?
P.S., no cheery responses received recently.


-p vs. ftp

2004-11-22 Thread Dan Jacobson
>>>>> "D" == Derek B Noonburg <[EMAIL PROTECTED]> writes:

D> On 20 Nov, Dan Jacobson wrote:
D> Can you try the binary on my web site?
D> ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-linux.tar.gz)
>> 
>> But my batch script to wget it doesn't get it.
>> I used wget -w 2 -e robots=off -p -t 1 -N -nv --random-wait

D> Looks like "-p" doesn't work correctly (or at least doesn't do the
D> expected thing) with ftp URLs.


error message contents thrown away

2005-01-08 Thread Dan Jacobson
There is no way to see what
$ lynx -dump http://wapp8.taipower.com.tw/ can show me
when
$ wget -O - -S -s http://wapp8.taipower.com.tw/
08:54:48 ERROR 403: Access Forbidden.
i.e., the site's error message contents.


weird

2005-01-29 Thread Dan Jacobson
Anybody home? This looks weird:
$ wget --spider -S -r -l 1  http://www.noise.com.tw/eia/product.htm
--09:22:13--  http://www.noise.com.tw/eia/product.htm
   => `www.noise.com.tw/eia/product.htm'
Resolving localhost... 127.0.0.1
Connecting to localhost[127.0.0.1]:8080... connected.
Proxy request sent, awaiting response... 
 1 HTTP/1.0 200 OK
 2 Date: Thu, 27 Jan 2005 00:00:07 GMT
 3 Server: Apache/1.3.20 (Unix) PHP/4.3.10
 4 Last-Modified: Wed, 19 Jan 2005 02:39:31 GMT
 5 ETag: "9f071-124a-41edc863"
 6 Accept-Ranges: bytes
 7 Content-Type: text/html
 8 Connection: close
 9 Proxy-Connection: close
200 OK

www.noise.com.tw/eia/product.htm: No such file or directory

FINISHED --09:22:13--
Downloaded: 0 bytes in 0 files



-N vs. Last-modified header missing

2005-02-08 Thread Dan Jacobson
1. Anybody home?

2. No way to make wget not refetch the file when:

   Last-modified header missing -- time-stamps turned off.
   09:55:20 URL:http://bm2ddp.myweb.hinet.net/b3.htm [16087] -> 
"uris.d/bm2ddp.myweb.hinet.net/b3.htm" [1]

when using wget -s -w 2 -e robots=off -P bla.d -p -t 1 -N -nv --random-wait -i -


--header with more than one cookie

2005-02-12 Thread Dan Jacobson
In the man page, show how one does this
   wget --cookies=off --header "Cookie: ="
with more than one cookie.


bug-wget still useful

2005-03-15 Thread Dan Jacobson
Is it still useful to mail to [EMAIL PROTECTED] I don't think
anybody's home.  Shall the address be closed?


Re: bug-wget still useful

2005-03-15 Thread Dan Jacobson
P> I don't know why you say that.  I see bug reports and discussion of fixes
P> flowing through here on a fairly regular basis.

All I know is my reports for the last few months didn't get the usual (any!)
cheery replies. However, I saw them on Gmane, yes.


flag to display just the errors

2005-03-16 Thread Dan Jacobson
I see I must do
wget --spider -i file -nv 2>&1|awk '!/^$|^200 OK$/'
as the only way to just get the errors.  There is no flag that will
only let the errors thru and silence the rest. -q silences all.
Wget 1.9.1