What operating system are you using? It may be a feature of your operating system. At 02:19 AM 10/14/2006, Tima Dronenko wrote: Hello :) Im not sure this is a bug or feature... I cant down load files bigger than 2GB using wget. Timofey. p.s. my log= /// wget -c http://uk1x1.fileplanet.com/%5E1530224706/ftp1/052006/World_of_Warcraft_Trial_Client_FP.zip --10:08:50-- http://uk1x1.fileplanet.com/%5E1530224706/ftp1/052006/World_of_Warcraft_Trial_Client_FP.zip = `World_of_Warcraft_Trial_Client_FP.zip' Resolving uk1x1.fileplanet.com... done. Connecting to uk1x1.fileplanet.com[22.214.171.124]:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: -1,236,732,656 (910,750,993 to go) [application/x-zip-compressed] 100%[==] 2,147,483,647 --.--K/sETA --:-- 10:08:51 (0.00 B/s) - `World_of_Warcraft_Trial_Client_FP.zip' saved [2147483647/-1236732656])
But I want WGET to convert %20 to space (I think). I'm using 1.9.1. I haven't checked to see if there is a new stable version. I'm using the windows binary. I'd love to go linux, but the startup transient is too much. (And the computer they furnish me at work is Windows) I regularly use WGET to download a file that is referenced in my work, so that the file is just downloaded and not opened by my browser into its viewing application. For large files the download seems to go much faster as well. Thanks, Fred Holmes At 09:01 PM 5/20/2004, Hrvoje Niksic wrote: Fred Holmes [EMAIL PROTECTED] writes: If I have a URL that has %20 in place of spaces, and I use the URL directly as the argument of WGET, it seems that the file is always not found. I've discovered that if I replace each %20 with a space, and put quotation marks around the entire URL, it works. That's weird. Wget converts space to %20, so I don't see any difference between using space and %20. Are you sure the URLs are otherwise the same? What version of Wget are you using?
At 04:55 AM 5/21/2004, Hrvoje Niksic wrote: OTOH, if you're talking about *file* names, %20 should already be converted to space. Yes, these URLs are for files, e.g., .pdf and .doc and .zip files. When I get to work today, if I can make a few minutes of time, I'll post an explicit example (or discover my error, I hope). I don't mean literally change to spaces, just parse the %20 correctly so that the file is in fact found and downloaded. I'm downloading single files, references on Google, using WGET instead of the browser. Thanks for your help. Fred Holmes
Here is an example of an instance where a filename containing %20 fails, but replacing the %20 with spaces, and enclosing in quotes works. At the end I find that just putting the original URL (with %20) in quotation marks makes it work. There is something else unusual about this URL. The first command validates the version. The second command is the desired URL/file and fails with %20 The third command repeats the second command with the -d switch. The fourth command has %20 replaced with space and works. Other files with %20 on different hosts/servers behave similarly. Microsoft Windows 2000 [Version 5.00.2195] (C) Copyright 1985-2000 Microsoft Corp. C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -V GNU Wget 1.9.1 Copyright (C) 2003 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET http://hqinet001.hqmc.u smc.mil/pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Aviati on%20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf --10:13:38-- http://hqinet001.hqmc.usmc.mil/p = `p' Resolving hqinet001.hqmc.usmc.mil... 126.96.36.199 Connecting to hqinet001.hqmc.usmc.mil[188.8.131.52]:80... connected. HTTP request sent, awaiting response... 404 Object Not Found 10:13:38 ERROR 404: Object Not Found. 'r' is not recognized as an internal or external command, operable program or batch file. C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -d http://hqinet001.hqm c.usmc.mil/pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Avi ation%20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf DEBUG output created by Wget 1.9.1 on Windows. set_sleep_mode(): mode 0x8001, rc 0x8000 --10:14:10-- http://hqinet001.hqmc.usmc.mil/p = `p' Resolving hqinet001.hqmc.usmc.mil... seconds 0.00, 184.108.40.206 Caching hqinet001.hqmc.usmc.mil = 220.127.116.11 Connecting to hqinet001.hqmc.usmc.mil[18.104.22.168]:80... seconds 0.00, connec ted. Created socket 720. Releasing 00894758 (new refcount 1). ---request begin--- GET /p HTTP/1.0 User-Agent: Wget/1.9.1 Host: hqinet001.hqmc.usmc.mil Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 404 Object Not Found Server: Microsoft-IIS/4.0 Date: Fri, 21 May 2004 14:19:44 GMT Content-Length: 461 Content-Type: text/html Closing fd 720 10:14:10 ERROR 404: Object Not Found. 'r' is not recognized as an internal or external command, operable program or batch file. C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -d http://hqinet001.hq mc.usmc.mil/pr/concepts/2004/PDF/CP 04 Chap 4 pdfs/CP04 CHAP 4 Aviation Combat Element - pp186_SINGLE INTEGRATED AIR PICTURE.pdf DEBUG output created by Wget 1.9.1 on Windows. set_sleep_mode(): mode 0x8001, rc 0x8000 --10:15:54-- http://hqinet001.hqmc.usmc.mil/pr/concepts/2004/PDF/CP%2004%20Cha p%204%20pdfs/CP04%20CHAP%204%20Aviation%20Combat%20Element%20-%20pp186_SINGLE%20 INTEGRATED%20AIR%20PICTURE.pdf = `CP04 CHAP 4 Aviation Combat Element - pp186_SINGLE INTEGRATED AIR PICTURE.pdf' Resolving hqinet001.hqmc.usmc.mil... seconds 0.00, 22.214.171.124 Caching hqinet001.hqmc.usmc.mil = 126.96.36.199 Connecting to hqinet001.hqmc.usmc.mil[188.8.131.52]:80... seconds 0.00, connec ted. Created socket 720. Releasing 00895088 (new refcount 1). ---request begin--- GET /pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Aviation% 20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf HTTP/1.0 User-Agent: Wget/1.9.1 Host: hqinet001.hqmc.usmc.mil Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: Microsoft-IIS/4.0 Connection: keep-alive Date: Fri, 21 May 2004 14:21:28 GMT Content-Type: application/pdf Accept-Ranges: bytes Last-Modified: Wed, 10 Mar 2004 14:25:50 GMT ETag: dee42996ab6c41:3994 Content-Length: 10971 Found hqinet001.hqmc.usmc.mil in host_name_addresses_map (00895088) Registered fd 720 for persistent reuse. Length: 10,971 [application/pdf] 100% 10,97159.19K/s 10:15:54 (59.19 KB/s) - `CP04 CHAP 4 Aviation Combat Element - pp186_SINGLE INTE GRATED AIR PICTURE.pdf' saved [10971/10971] C:\Documents and Settings\fholmes\Desktop\WGET-TEST -- And now I have discovered that just putting quotation marks around the entire URL makes it work, but it is not found on the first try and then it is found when WGET automatically makes a second try with a little different syntax. ? See below. Microsoft Windows 2000 [Version 5.00.2195] (C) Copyright 1985-2000
Well, it's not simply the %20 that is the problem. Here's a simple, straightforward URL that has %20's in it and it downloads just fine. My apologies for the bum steer. Fred Holmes Microsoft Windows 2000 [Version 5.00.2195] (C) Copyright 1985-2000 Microsoft Corp. C:\Documents and Settings\fholmes\DesktopWGET http://www.dau.mil/pubs/glossary/ 11th%20Glossary%202003.pdf --15:54:09-- http://www.dau.mil/pubs/glossary/11th%20Glossary%202003.pdf = `11th Glossary 2003.pdf' Resolving www.dau.mil... 184.108.40.206 Connecting to www.dau.mil[220.127.116.11]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 782,785 [application/pdf] 100% 782,785 79.13K/sETA 00:00 15:54:17 (97.62 KB/s) - `11th Glossary 2003.pdf' saved [782785/782785] C:\Documents and Settings\fholmes\Desktop
If I have a URL that has %20 in place of spaces, and I use the URL directly as the argument of WGET, it seems that the file is always not found. I've discovered that if I replace each %20 with a space, and put quotation marks around the entire URL, it works. It would be nice to have a switch/option in WGET that would do this interpretation automatically, without my having to put the URL into a text editor and run the replace operation manually or as a macro and then copy the URL back into WGET's argument. I presume that there is similar behavior when other characters (than the space) have been replaced by % and their hex equivalent.
I would love to see an enhancement that allows specification of a total size of a download or a total elapsed time of a download threshold at which point the download would self-abort * and then process the actually-downloaded files to convert links (-k) *. Currently if one manually aborts a large/long download, the -k process does not occur, and the already-downloaded files are wasted. Is there currently some way to recover from this? Thanks, Fred Holmes P.S. is this the right list to post wishlist items.
At 06:23 PM 2/8/2004, Hrvoje Niksic wrote: Does anyone have an idea what we should consider the home dir under Windows, and how to find it? In Windows 2000, if I enter SET at the command prompt, I get a return that is a listing of all of the environment variables that have been established (set). On my machine, part of that listing is as follows. D:\SET snip HOMEDRIVE=D: HOMEPATH=\Documents and Settings\Administrator snip (D: is my boot drive, and therefore my home drive) (SET [variable] = [value] is the command for establishing an environment variable and its value, in Windows and DOS. SET by itself with no argument reports all of the environment variables and their values.) I'm not a real windows programmer, but any windows compiler should be able to get the values of these environment variables on a particular machine. The are generally/often referenced (at least in windows command prompt batch files) as %HOMEDRIVE% and %HOMEPATH%. Other flavors of Windows should be similar, if not the same, but I don't have the means to test any of them. Fred Holmes
Yes, yes, yes, I want this feature. I asked for it explicitly some time ago, but no one felt it worthy. I'm not a programmer, so I can't do it myself. Fred Holmes At 02:05 PM 12/29/2003, Vlada Macek wrote: --host-level=n It would be useful for me to differentiate the maximum recursion level in the case wget is spidering on the original host and when it is spaning other hosts. I may want the files in 5 levels on the original host but just to the level of 2 on the other hosts. To let -l5 and -H together can quickly try to fill my disk.:)
I pointed this out about a year ago. As I recall, the response I got back then was that fixing it is too hard. I'm looking for any way to download new/newer files on a specific list (wild cards won't make the proper selection) where wget makes one connection and keeps it for the entire operation. In my instance the annoyance was that wget dropped the connection after each file was downloaded and then took time to remake the connection for the next file. The .listing file isn't so long as to be a problem, but if the server is busy (close to overload), I want to keep the first established connection until the job is done. (All files on the list are in the same directory on the same host. But I only want to update four files out of about twenty, and some of the unwanted files are large enough that I don't want to just download all of them.) Fred Holmes At 11:35 PM 7/12/2014, Adam Klobukowski wrote: If wget is used with --input-file option, it gets directory listing for each file specified in input file (if ftp protocol) before downloading each file, which is quite annyoying if there are few thousand of small files in the filelist, and every directory listing is way longer then any file, in other words: overhead is to big to be reasonable. -- Semper Fidelis Adam Klobukowski [EMAIL PROTECTED]
At 06:30 PM 11/25/2003, Hrvoje Niksic wrote: Are you using --timestamping (-N)? If so, can you do without it, or replace it with --no-clobber? But then you will only download new files, not newer files? But I want the newer files (updated virus definition files from ftp.f-prot.com). And I tried -nc on downloading only new files from ftp.eps.gov. While it worked, the comparison is very slow, a significant fraction of a second to compare each file. With over 700 files to compare and refuse, it takes a long time to perform the comparison operation on all of the files. With -N, and comparing using the .listing file, the comparison of all 700 files takes only about a second after the .listing file has been downloaded, and the download of the one new file (or two or three new files if a couple of days have gone by) begins immediately. v/r Fred Holmes
I am and have been using NTFS since the installation of the OS, on a brand new machine. At 05:40 PM 11/4/2003, Gisle Vanem wrote: Fred Holmes [EMAIL PROTECTED] said: OTOH, if anyone knows how to make Windows stop changing the time stamps, that would be even better. You're using FAT filesystem? Convert to NTFS; it stores filetimes in UTC (as 64-bit, 100 nanosecond steps from 1 jan 1601). --gv
At 07:24 PM 11/4/2003, Hrvoje Niksic wrote: It continues to amaze me how many people use Wget on Windows. Anyway, thanks for the detailed bug report. I would love to learn linux and a whole bunch of computer stuff, but there are only so many hours in a day. I'm not an IT guy, just a worker that has to learn the computer for himself and figure the most efficient way to get stuff done, where efficiency includes cost of capital and learning curves as well. Many thanks to all who contribute for a very fine product. I had messed with a couple of gui sitesnag programs and found them lacking, and asked for a better recommendation on a local discussion list (WAMU ComputerGuys). A gal by the name of Vicky Staubly recommended WGET, and the rest, as they say, is history. v/r Fred Holmes
At 07:24 PM 11/4/2003, Hrvoje Niksic wrote: Until then, if old files really never change, could you simply use `-nc'? Yes, that will do it quite nicely. I missed that one. I'll try it tomorrow, but a simple condition like that should work well. Thanks for your help. Fred Holmes
It would be nice if WGET were to launch and run without the SSL DLLs if one doesn't need SSL, and only produce an error message / halt if one tries to actually utilize SSL without having the DLLs. So far I haven't needed SSL for anything I've actually used WGET for. Or perhaps a dialog: SSL DLLs not found, proceed? Fred Holmes At 10:12 AM 10/10/2003, Vesselin Peev wrote: Thanks, I'll look into it as a simpler altenative solution. One nice side effect of wget source recompilation is that I was able to disable SSL support which I don't need and did away with the two OpenSSL DLLs. And yes, main() had to be changed to WinMain(). I have a lean non-console version of wget now! Best regards, Vesko
At 12:05 PM 10/3/2003, Hrvoje Niksic wrote: It's a feature. `-A zip' means `-A zip', not `-A zip,html'. Wget downloads the HTML files only because it absolutely has to, in order to recurse through them. After it finds the links in them, it deletes them. How about a switch to keep the .html file, similar to the -nr switch that keeps the .listing file for ftp downloads?
How can one handle the following, where the URL is a search script? The URL will load the base page into one's browser correctly, but when it is used as an argument for WGET, WGET tries to use it as an output filename, and the filename contains invalid characters for Windows. Wget 1.8.2 for Windows. H:\WGETWGET -N -r -l 2 -k -K -p -np -e robots=off http://thomas.loc.gov/cgi-bin/cpquery/z?cp108:hr126: --07:53:06-- http://thomas.loc.gov/cgi-bin/cpquery/z?cp108:hr126: = `thomas.loc.gov/cgi-bin/cpquery/[EMAIL PROTECTED]:hr126:' Resolving thomas.loc.gov... done. Connecting to thomas.loc.gov[18.104.22.168]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] thomas.loc.gov/cgi-bin/cpquery/[EMAIL PROTECTED]:hr126:: Invalid argument Cannot write to `thomas.loc.gov/cgi-bin/cpquery/[EMAIL PROTECTED]:hr126:' (Invalid argument). FINISHED --07:53:09-- Downloaded: 0 bytes in 0 files thomas.loc.gov/cgi-bin/cpquery/[EMAIL PROTECTED]:hr126:: Invalid argument Converting thomas.loc.gov/cgi-bin/cpquery/[EMAIL PROTECTED]:hr126:... nothing to do. Converted 1 files in 0.00 seconds.
http://space.tin.it/computer/hherold/ Go to Mini HOWTO quick start: on the page and read the instructions there. Fred Holmes At 12:14 AM 3/31/2003, lameon wrote: I'm using windows NT/2000/XP, and where should I put the .wgetrc file. Thanks!
-e robots=off -i filespec where filespec is an ASCII file containing the list of URLs to be downloaded. At 12:46 AM 2/22/2003, Payal Rathod wrote: Hi all, Can I tell wget to ignore robots.txt? If so how do I do it? Also, if I have 10 different URL to retrieve from, can I specify all of them in a file and ask wget to get them from the file and retrieve them without any manual intervention? How do I do it? Thanks a lot and bye. With warm regards, -Payal -- Visit GNU/Linux Success Stories www.geocities.com/rpayal99 Guest-Book Section Updated.
From the help -p, --page-requisitesget all images, etc. needed to display HTML page. I think you need to add the -p option as well. Fred Holmes At 05:05 AM 2/16/2003, Oleg Gorchakov wrote: Hello, I tried to copy to my local disk the manual http://www.kgraph.narod.ru/lectures/lectures.htm like wget -r -k -l 4 -nH http://www.kgraph.narod.ru/lectures/lectures.htm but 99.9% of .gif files were not copied and their links left like absolute links (for example src=http://www.kgraph.narod.ru/lectures/2_1/image002.gif; v:shapes=_x_i1025![endif]!--[if gte mso 9]xml instead of src=./2_1/image002.gif v:shapes=_x_i1025![endif]!--[if gte mso 9]xml Maybe I was a bit wrong with WGET usage? With best wishes Oleg Gorchakov
You need -N (upper case). The switch is case sensitive. Glad to see that someone else is an f-prot user. At 12:23 AM 1/22/2003, Steve Bratsberg wrote: I am using Wget to update the date file for f-prot disks that boot from Dos. I have a copy of the zipped dat file on the hard drive and I have wget set with -n to get the latest. when I d/l the latest Wget will give the file an extension (numerical value) - how can I st up wget to clobber if it is a new file? please cc responce.. Steve
At 02:54 AM 2/8/2002, Hrvoje Niksic wrote: Wget currently uses KB as abbreviation for kilobyte. In a Debian bug report someone suggested that kB should be used because it is more correct. The reporter however failed to cite the reference for this, and a search of the web has proven inconclusive. Well, certainly among physicists, the k for kilo = x1000 is lower case. Consult any style manual for writing articles in scholarly physics journals. Of course, computer folks do as they please. g Fred Holmes
At 03:09 AM 2/8/2002, Adrian Aichner wrote: I've seen common practice where multipliers greater unity (K, M, G, T) are uppercase, smaller unitiy (m, u, n, p, a) are lower case. I believe that in physics journals, k is used for kilo to distinguish it from K that is used for temperature degrees Kelvin (Celsius degrees above absolute zero). (That's the case, I just don't know if that's the reason.) The degree symbol is (was) generally used in typeset stuff, but the convention was developed in the days of the typewriter. But k is definitely used for kilo among physicists -- unless there has been some recent revisionism. I'm not totally up to date in this stuff. Fred Holmes
wget win32 binary Downloading a single file works just fine, as follows wget -N http://www.karenware.com/progs/ptreplicator-setup.exe but wget -N http://www.karenware.com/progs/*.* fails with a not found whether the filespec is * or *.* The * syntax works just fine with ftp Is there a syntax that will get all files with http? Thanks, Fred Holmes [EMAIL PROTECTED]
Use the following to download files from an ftp site. wget -N -nr ftp://ftp.f-prot.is/pub/* The -N restricts the download to new/newer files only. -nr is necessary to retain the .listing file. By default it is deleted. Substitute a specific filename for * if that's what you want. Use -i filespec to list the URLs in a list file. Use the above with -B to specify the base URL on the command line, and list just the filenames in the list file. Fred Holmes At 02:36 AM 1/28/2002, Nagaraj Gupta wrote: hi, I just downloaded windows version of wget(i.e, wget 1.8.1) and im very new to use this utility. can u please tell me how to download a files from the internet or ftp server. when try using wget -r -s http://www.yahoo.com or wget ftp://ftp.acdsystems.com these options it downloads the link page info. into index.html file by default. suppose if i want to download a file from the ftp server what should i do?? kindly help me wget user
-p download all support files necessary to view the page It doesn't seem to be in the wget --help information, but it's in the complete manual. Fred Holmes At 04:09 PM 1/27/2002, Didier Bretin wrote: Hello, Can you help me with the different options of wget. I'm under linux, with Netscape 6.2 (based on mozilla). Sometimes I would like to have a copy of a page I see on my navigator. But Netscape save only the html file. So I would like to use wget to download the page and all the images of this page. What option do I need to use to do this operation ? Thanks for you help. Best regards. -- Didier BRETIN [EMAIL PROTECTED] http://www.bretin.net ICQ: 46032186
At 09:02 AM 1/14/2002, Hrvoje Niksic wrote: Fred Holmes [EMAIL PROTECTED] writes: Is there a syntax such that I can connect to the host once, transfer the four files, and then disconnect? Unfortunately, no, not yet. Actually, I dug through the documentation some more and found I could use the -B (base) option to put the URL on the command line, and list the four files in the list file. This makes the action what I want, but unfortunately requires editing the command line (batch file). It would be nice if wget eventually recognizes identical hosts and connects only once. I also discovered much to my delight that I could use for the list file a general text file that is a synopsis I retrieved from the Commerce Business Daily / Fed Biz Opps. It parsed the text file, reported an error on each of the many lines that was not a URL, and proceeded to download correctly each / all of 22 files listed in the text file as separate URLs -- all without operator intervention. I can ignore the error messages, and I don't have to edit the text file to delete all but the URLs. That's great. I presume that I would have to do enough editing so that each of the URLs is on a line by itself, not embedded in text. My test example was already that way. But on the wish list . . . wget is a marvelous product. Fred Holmes [EMAIL PROTECTED]
I would like to ftp four files from a host. The files are in a large directory, and are different such that wild cards won't do it. If I write the following routine: WGET -N -i Files.txt where Files.txt is: ftp://ftp.f-prot.is/pub/fp-def.zip ftp://ftp.f-prot.is/pub/fp-def.asc ftp://ftp.f-prot.is/pub/macrdef2.zip ftp://ftp.f-prot.is/pub/nomacro.def the process disconnects from the host after each transfer, and reconnects for the next transfer. Is there a syntax such that I can connect to the host once, transfer the four files, and then disconnect? If the host isn't busy it probably doesn't make any difference, but if the host is busy, one doesn't want to lose an established connection. Thanks, Fred Holmes [EMAIL PROTECTED]
WGET suggestion The -H switch/option sets host-spanning. Please provide a way to specify a different limit on recursion levels for files retrieved from foreign hosts. -r -l0 -H2 for example would allow unlimited recursion levels on the target host, but only 2 [addtional] levels when a file is being retrieved from a foreign host. Second suggestion: The -i switch provides for a file listing the URLs to be downloaded. Please provide for a list file for URLs to be avoided when -H is enabled. Thanks for listening. And thanks for a marvelous product. Fred Holmes [EMAIL PROTECTED]
It would be nice to have some way to limit the total size of any job, and have it exit gracefully upon reaching that size, by completing the -k -K process upon termination, so that what one has downloaded is useful. A switch that would set the total size of all downloads --total-size=600MB would terminate the run when the total bytes downloaded reached 600 MB, and process the -k -K. What one had already downloaded would then be properly linked for viewing. Probably more difficult would be a way of terminating the run manually (Ctrl-break??), but then being able to run the -k -K process on the already-downloaded files. Fred Holmes