Re: Annyoing behaviour with --input-file

2003-11-25 Thread Fred Holmes
I pointed this out about a year ago.  As I recall, the response I got back 
then was that fixing it is too hard.  I'm looking for any way to download 
new/newer files on a specific list (wild cards won't make the proper 
selection) where wget makes one connection and keeps it for the entire 
operation.  In my instance the annoyance was that wget dropped the 
connection after each file was downloaded and then took time to remake the 
connection for the next file.  The .listing file isn't so long as to be a 
problem, but if the server is busy (close to overload), I want to keep the 
first established connection until the job is done.  (All files on the list 
are in the same directory on the same host.  But I only want to update four 
files out of about twenty, and some of the unwanted files are large enough 
that I don't want to just download all of them.)

Fred Holmes

At 11:35 PM 7/12/2014, Adam Klobukowski wrote:
If wget is used with --input-file option, it gets directory listing
for each file specified in input file (if ftp protocol) before
downloading each file, which is quite annyoying if there are few
thousand of small files in the filelist, and every directory listing
is way longer then any file, in other words: overhead is to big to be
reasonable.
--
Semper Fidelis
Adam Klobukowski
[EMAIL PROTECTED]



GNU Wget 1.8.2 --output-document and --page-requisites incompatible

2003-11-25 Thread Lars Noodén
I've been using  wget for a few years now (it's been great) and find 
it increasingly useful.
Right now I've got  GNU Wget 1.8.2 and have noticed a quirk:

--output-document and --page-requisites don't seem to like to work 
together. e.g.

	bash-2.05a$ wget --non-verbose 
--output-document=./0001/index.html --no-directories \
	--directory-prefix=0001 --page-requisites \
	'http://www.infoworld.com/article/03/05/16/20OPcringely_1.html'
	11:02:31 
URL:http://www.infoworld.com/article/03/05/16/20OPcringely_1.html 
[33242] - ./0001/index.html [1]
	0001/20OPcringely_1.html: No such file or directory

FINISHED --11:02:31--
Downloaded: 33,242 bytes in 1 files


-Lars


Re: keep alive connections

2003-11-25 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

 |  /* Return if we have no intention of further downloading.  */
 |  if (!(*dt  RETROKF) || (*dt  HEAD_ONLY))
 |{
 |  /* In case the caller cares to look...  */
 |  hs-len = 0L;
 |  hs-res = 0;
 |  FREE_MAYBE (type);
 |  FREE_MAYBE (all_headers);
 |  CLOSE_INVALIDATE (sock);   /* would be CLOSE_FINISH, but there
 |might be more bytes in the body. */
 |  return RETRFINISHED;
 |}

 ...changing CLOSE_INVALIDATE to CLOSE_FINISH.

That's exactly the right change.  As the comment implies, the only
reason for using CLOSE_INVALIDATE is fear that a misbehaving CGI might
send more data, thus confusing the next request or even causing
deadlock while writing the request to the server.

When keep-alive connections are not in use (which can be forced with
--no-http-keep-alive), CLOSE_INVALIDATE and CLOSE_FINISH are pretty
much identical.



Re: Recursive ftp broken

2003-11-25 Thread Hrvoje Niksic
Thanks for the report, this is most likely caused by my recent changes
that eliminate rbuf* from the code.  (Unfortunately, the FTP code kept
some state in struct rbuf, and my changes might have broken things.)
To be absolutely sure, see if it works under 1.9.1 or under CVS from
one week ago.



Re: Wget dies with file size limit exceeded on files 2 gigs

2003-11-25 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 A patch was recently submitted for this issue. I don't know if
 anything has made it into the CVS or not. Hrvoje didn't like its
 dependence on long long so it might not have.

The patch uses `long long' without bothering to check whether the
compiler accepts it.  This is bad because, except for GCC, `long long'
is a fairly recent invention (and people on 64-bit platforms might
argue that they don't even need it because they have 64-bit `long'.)
Large-file aware application should use off_t instead, and be written
to work well regardless of its size.  Portable printing of off_t
values is tricky, but it can be done.

The patch goes ahead and simply assumes that `long long' is 64 bits
wide, which need not be the case.  It changes %ld format to %lld,
which invalidates every single available translation.  I asked the
submitters about this, but they never responded, which indicates that
they either don't understand the problem or don't care about fixing
it.



Re: GNU Wget 1.8.2 --output-document and --page-requisites incompatible

2003-11-25 Thread Hrvoje Niksic
Thanks for the report.  This is a known bug, that is unfortunately
also present in 1.9.x.  I hope to fix it for the next release.



correct processing of redirections

2003-11-25 Thread Peter Kohts
Hi there.

Let me explain the problem:

1) I'm trying to prepare for being a mirror of www.gnu.org
(which is not the most ashamed thing to do, I suppose).

2) I'm somewhat devoted to wget and do not want to use
other software.

3) There're some redirects at www.gnu.org to other hosts
like savannah.gnu.org, gnuhh.org, etc.

4) When I'm doing straight-forward wget -m -nH http://www.gnu.org;
everything is excellent, except the redirections: the files which we
get because of the redirections overwrite any currently existing
files with the same filenames.

Example:
Let's imagine that wget has downloaded some part of www.gnu.org,
then (of course) it has downloaded the first file (or maybe second,
if robots.txt goes first): index.html (which is
http://www.gnu.org/index.html). Now when wget comes across the
http://www.gnu.org/people/greve/greve.html is gets 302 (moved) to
http://gnuhh.org/. Now it goes right there and downloads index.html,
which immediately overwrites index.html downloaded from
http://www.gnu.org/index.html.


I'd suggest that wget processes redirections as usual links, just
add them to the processing queue and forget about them, do not
download them without previously checked them with
download_child_p().

Using this approach works well if you're mirroring some site, but
might not be the most awaited behaviour when you're downloading
just one page: the page won't be downloaded if it's redirected
to another host. So the second situation needs some different
processing rules.


That's it. Share your opinions, please (especially, Hrvoje,
since you're the maintainer :-)

Peter.



Re: Recursive ftp broken

2003-11-25 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:
[...]
 == SYST ... done.== PWD ... done.   !   is '/' here
 == TYPE I ... done.  == CWD not required.
 == PORT ... done.== RETR BAN-SHIM.ZIP ...
 No such file `BAN-SHIM.ZIP'.
 ...

Interestingly, I can't repeat this.  Still, to be on the safe side, I
added some additional restraints to the code that make it behave more
like the previous code, that worked.  Please try again and see if it
works now.  If not, please provide some form of debugging output as
well.



Re: Annyoing behaviour with --input-file

2003-11-25 Thread Fred Holmes
At 06:30 PM 11/25/2003, Hrvoje Niksic wrote:
Are you using --timestamping (-N)?  If so, can you do without it, or
replace it with --no-clobber?
But then you will only download new files, not newer files?  But I want the 
newer files (updated virus definition files from ftp.f-prot.com).

And I tried -nc on downloading only new files from ftp.eps.gov.  While it 
worked, the comparison is very slow, a significant fraction of a second to 
compare each file.  With over 700 files to compare and refuse, it takes a 
long time to perform the comparison operation on all of the files.  With 
-N, and comparing using the .listing file, the comparison of all 700 files 
takes only about a second after the .listing file has been downloaded, and 
the download of the one new file (or two or three new files if a couple of 
days have gone by) begins immediately.

v/r

Fred Holmes 



can you authenticate to a http proxy with a username that contains a space?

2003-11-25 Thread antonio taylor
example:

http://fisrtname lastname:[EMAIL PROTECTED]



thanks,
T


Re: can you authenticate to a http proxy with a username that contains a space?

2003-11-25 Thread Tony Lewis
antonio taylor wrote:

 http://fisrtname lastname:[EMAIL PROTECTED]

Have you tried http://fisrtname%20lastname:[EMAIL PROTECTED] ?