Re: Annyoing behaviour with --input-file
I pointed this out about a year ago. As I recall, the response I got back then was that fixing it is too hard. I'm looking for any way to download new/newer files on a specific list (wild cards won't make the proper selection) where wget makes one connection and keeps it for the entire operation. In my instance the annoyance was that wget dropped the connection after each file was downloaded and then took time to remake the connection for the next file. The .listing file isn't so long as to be a problem, but if the server is busy (close to overload), I want to keep the first established connection until the job is done. (All files on the list are in the same directory on the same host. But I only want to update four files out of about twenty, and some of the unwanted files are large enough that I don't want to just download all of them.) Fred Holmes At 11:35 PM 7/12/2014, Adam Klobukowski wrote: If wget is used with --input-file option, it gets directory listing for each file specified in input file (if ftp protocol) before downloading each file, which is quite annyoying if there are few thousand of small files in the filelist, and every directory listing is way longer then any file, in other words: overhead is to big to be reasonable. -- Semper Fidelis Adam Klobukowski [EMAIL PROTECTED]
GNU Wget 1.8.2 --output-document and --page-requisites incompatible
I've been using wget for a few years now (it's been great) and find it increasingly useful. Right now I've got GNU Wget 1.8.2 and have noticed a quirk: --output-document and --page-requisites don't seem to like to work together. e.g. bash-2.05a$ wget --non-verbose --output-document=./0001/index.html --no-directories \ --directory-prefix=0001 --page-requisites \ 'http://www.infoworld.com/article/03/05/16/20OPcringely_1.html' 11:02:31 URL:http://www.infoworld.com/article/03/05/16/20OPcringely_1.html [33242] - ./0001/index.html [1] 0001/20OPcringely_1.html: No such file or directory FINISHED --11:02:31-- Downloaded: 33,242 bytes in 1 files -Lars
Re: keep alive connections
Alain Bench [EMAIL PROTECTED] writes: | /* Return if we have no intention of further downloading. */ | if (!(*dt RETROKF) || (*dt HEAD_ONLY)) |{ | /* In case the caller cares to look... */ | hs-len = 0L; | hs-res = 0; | FREE_MAYBE (type); | FREE_MAYBE (all_headers); | CLOSE_INVALIDATE (sock); /* would be CLOSE_FINISH, but there |might be more bytes in the body. */ | return RETRFINISHED; |} ...changing CLOSE_INVALIDATE to CLOSE_FINISH. That's exactly the right change. As the comment implies, the only reason for using CLOSE_INVALIDATE is fear that a misbehaving CGI might send more data, thus confusing the next request or even causing deadlock while writing the request to the server. When keep-alive connections are not in use (which can be forced with --no-http-keep-alive), CLOSE_INVALIDATE and CLOSE_FINISH are pretty much identical.
Re: Recursive ftp broken
Thanks for the report, this is most likely caused by my recent changes that eliminate rbuf* from the code. (Unfortunately, the FTP code kept some state in struct rbuf, and my changes might have broken things.) To be absolutely sure, see if it works under 1.9.1 or under CVS from one week ago.
Re: Wget dies with file size limit exceeded on files 2 gigs
Tony Lewis [EMAIL PROTECTED] writes: A patch was recently submitted for this issue. I don't know if anything has made it into the CVS or not. Hrvoje didn't like its dependence on long long so it might not have. The patch uses `long long' without bothering to check whether the compiler accepts it. This is bad because, except for GCC, `long long' is a fairly recent invention (and people on 64-bit platforms might argue that they don't even need it because they have 64-bit `long'.) Large-file aware application should use off_t instead, and be written to work well regardless of its size. Portable printing of off_t values is tricky, but it can be done. The patch goes ahead and simply assumes that `long long' is 64 bits wide, which need not be the case. It changes %ld format to %lld, which invalidates every single available translation. I asked the submitters about this, but they never responded, which indicates that they either don't understand the problem or don't care about fixing it.
Re: GNU Wget 1.8.2 --output-document and --page-requisites incompatible
Thanks for the report. This is a known bug, that is unfortunately also present in 1.9.x. I hope to fix it for the next release.
correct processing of redirections
Hi there. Let me explain the problem: 1) I'm trying to prepare for being a mirror of www.gnu.org (which is not the most ashamed thing to do, I suppose). 2) I'm somewhat devoted to wget and do not want to use other software. 3) There're some redirects at www.gnu.org to other hosts like savannah.gnu.org, gnuhh.org, etc. 4) When I'm doing straight-forward wget -m -nH http://www.gnu.org; everything is excellent, except the redirections: the files which we get because of the redirections overwrite any currently existing files with the same filenames. Example: Let's imagine that wget has downloaded some part of www.gnu.org, then (of course) it has downloaded the first file (or maybe second, if robots.txt goes first): index.html (which is http://www.gnu.org/index.html). Now when wget comes across the http://www.gnu.org/people/greve/greve.html is gets 302 (moved) to http://gnuhh.org/. Now it goes right there and downloads index.html, which immediately overwrites index.html downloaded from http://www.gnu.org/index.html. I'd suggest that wget processes redirections as usual links, just add them to the processing queue and forget about them, do not download them without previously checked them with download_child_p(). Using this approach works well if you're mirroring some site, but might not be the most awaited behaviour when you're downloading just one page: the page won't be downloaded if it's redirected to another host. So the second situation needs some different processing rules. That's it. Share your opinions, please (especially, Hrvoje, since you're the maintainer :-) Peter.
Re: Recursive ftp broken
Gisle Vanem [EMAIL PROTECTED] writes: [...] == SYST ... done.== PWD ... done. ! is '/' here == TYPE I ... done. == CWD not required. == PORT ... done.== RETR BAN-SHIM.ZIP ... No such file `BAN-SHIM.ZIP'. ... Interestingly, I can't repeat this. Still, to be on the safe side, I added some additional restraints to the code that make it behave more like the previous code, that worked. Please try again and see if it works now. If not, please provide some form of debugging output as well.
Re: Annyoing behaviour with --input-file
At 06:30 PM 11/25/2003, Hrvoje Niksic wrote: Are you using --timestamping (-N)? If so, can you do without it, or replace it with --no-clobber? But then you will only download new files, not newer files? But I want the newer files (updated virus definition files from ftp.f-prot.com). And I tried -nc on downloading only new files from ftp.eps.gov. While it worked, the comparison is very slow, a significant fraction of a second to compare each file. With over 700 files to compare and refuse, it takes a long time to perform the comparison operation on all of the files. With -N, and comparing using the .listing file, the comparison of all 700 files takes only about a second after the .listing file has been downloaded, and the download of the one new file (or two or three new files if a couple of days have gone by) begins immediately. v/r Fred Holmes
can you authenticate to a http proxy with a username that contains a space?
example: http://fisrtname lastname:[EMAIL PROTECTED] thanks, T
Re: can you authenticate to a http proxy with a username that contains a space?
antonio taylor wrote: http://fisrtname lastname:[EMAIL PROTECTED] Have you tried http://fisrtname%20lastname:[EMAIL PROTECTED] ?