Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, I am getting a strange bug when I use wget to download a binary file from a URL versus when I manually download. The attached ZIP file contains two files: 05.upc --- manually downloaded dum.upc--- downloaded through wget wget adds a number of ascii characters to the head of the file and seems to delete a similar number from the tail. So the file sizes are the same but the addition and deletion renders the file useless. Could you please direct me on if I should be using some specific option to avoind this problem? In the future, it's useful to mention which version of Wget you're using. The problem you're having is that the server is adding the extra HTML at the front of your session, and then giving you the file contents anyway. It's a bug in the PHP code that serves the file. You're getting this extra content because you are not logged in when you're fetching it. You need to have Wget send a cookie with an login-session information, and then the server will probably stop sending the corrupting information at the head of the file. The site does not appear to use HTTP's authentication mechanisms, so the [EMAIL PROTECTED] bit in the URL doesn't do you any good. It uses Forms-and-cookies authentication. Hopefully, you're using a browser that stores its cookies in a text format, or that is capable of exporting to a text format. In that case, you can just ensure that you're logged in in your browser, and use the - --load-cookies=cookies.txt option to Wget to use the same session information. Otherwise, you'll need to use --save-cookies with Wget to simulate the login form post, which is tricky and requires some understanding of HTML Forms. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE i8jn5i5Y6wLX1g3Q2hlDgcM= =uOke -END PGP SIGNATURE-
Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, Thanks for the prompt response. I am using GNU Wget 1.10.2 I tried a few things on your suggestion but the problem remains. 1. I exported the cookies file in Internet Explorer and specified that in the Wget command line. But same error occurs. 2. I have an open session on the site with my username and password. 3. I also tried running wget while I am downloading a file from the IE session on the site, but the same error. Sounds like you'll need to get the appropriate cookie by using Wget to login to the website. This requires site-specific information from the user-login form page, though, so I can't help you without that. If you know how to read some HTML, then you can find the HTML form used for posting username/password stuff, and use wget --keep-session-cookies --save-cookies=cookies.txt \ - --post-data='username=foopassword=bar' ACTION Where ACTION is the value of the form's action field, USERNAME and PASSWORD (and possibly further required values) are field names from the HTML form, and FOO and BAR is the username/password. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H fDp2J2oTBKlxW17eQ2jaCAA= =Khmi -END PGP SIGNATURE-
[fwd] Wget Bug: recursive get from ftp with a port in the url fails
---BeginMessage--- Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara. The file system is NTFS. Well I find my problem is, I wrote the command in schedule tasks like this: wget -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P d:\virus.update\kaspersky well, after wget,and before -N, I typed TWO spaces. After delete one space, wget works well again. Hope this can help. :) -- from:baalchina ---End Message---
Re: [fwd] Wget Bug: recursive get from ftp with a port in the url fails
Hrvoje Niksic wrote: Subject: Re: Wget Bug: recursive get from ftp with a port in the url fails From: baalchina [EMAIL PROTECTED] Date: Mon, 17 Sep 2007 19:56:20 +0800 To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: multipart/alternative; boundary===-=-= Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara. The file system is NTFS. Well I find my problem is, I wrote the command in schedule tasks like this: wget -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P d:\virus.update\kaspersky well, after wget,and before -N, I typed TWO spaces. After delete one space, wget works well again. Hope this can help. :) Hi baalchina, Hrvoje forwarded your message to the Wget discussion mailing list, where such questions are really more appropriate, especially since Hrvoje is not maintaining Wget any longer, but has left that responsibility for others. What you're describing does not appear to be a bug in Wget; it's the shell's (or task scheduler's, or whatever) responsibility to split space-separated elements properly; the words are supposed to already be split apart (properly) by the time Wget sees it. Also, you didn't really describe what was going wrong with Wget, or what message about it's failure you were seeing (perhaps you'd need to specify a log file with -o log, or via redirection of the command interpreter supports it). However, if the problem is that Wget was somehow seeing the space, as a separate argument or as part of another one, then the bug lies with your task scheduler (or whatever is interpreting the command line). -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ signature.asc Description: OpenPGP digital signature
Re: wget bug?
On Mon, 9 Jul 2007 15:06:52 +1200 [EMAIL PROTECTED] wrote: wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? hi nikolaus, in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. try using %1% instead of %1. -- Mauro Tortonesi [EMAIL PROTECTED]
Re: wget bug?
Mauro Tortonesi schrieb: On Mon, 9 Jul 2007 15:06:52 +1200 [EMAIL PROTECTED] wrote: wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? hi nikolaus, in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. try using %1% instead of %1. AFAIK it's ok to use %1, because it is a special case. Also the error would be a 404 or some wget error in that case the variable gets substituted in a wrong way or not? (actually even than you get a 200 response with that url) I just tried using the command inside a batch-file and came across another problem: You used a lowercase -s wich is not recognized by my wget-version, but a uppercase -S is. i guess you should change that. I would guess wget is not in your PATH. Try using c:\path\to\the dircetory\wget.exe instead of just wget. If this too does not hel at explicit --restrict-file-names=windows to your options, so wget does not try to use the ? inside a filename. (normally not needed) So a should-work-for-all-means-version is c:\path\wget.exe -S --save-headers --restrict-file-names=windows http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; Of course just one line, but my dump mail-editor wrapped it. Greetings Matthias
wget bug?
wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? thank you Dr Nikolaus Hermanspahn Advisor (Science) National Radiation Laboratory Ministry of Health DDI: +64 3 366 5059 Fax: +64 3 366 1156 http://www.nrl.moh.govt.nz mailto:[EMAIL PROTECTED] Statement of confidentiality: This e-mail message and any accompanying attachments may contain information that is IN-CONFIDENCE and subject to legal privilege. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. * This e-mail message has been scanned for Viruses and Content and cleared by the Ministry of Health's Content and Virus Filtering Gateway *
RE: wget bug
Highlord Ares wrote: it tries to download web pages named similar to http://site.com?variable=yesmode=awesome http://site.com?variable=yesmode=awesome Since is a reserved character in many command shells, you need to quote the URL on the command line: wget http://site.com?variable=yesmode=awesome http://site.com?variable=yesmode=awesome; Tony
wget bug
when I run wget on a certain sites, it tries to download web pages named similar to http://site.com?variable=yesmode=awesome. However, wget isn't saving any of these files, no doubt because of some file naming issue? this problem exists in both the Windows unix versions. hope this helps
RE: wget bug
This does not look like a valid URL to me - shouldn't there be a slash at the end of the domain name? Also, when talking about a bug (or anything else), it is always helpful if you specify the wget version (number). From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Highlord Ares Sent: Thursday, May 24, 2007 11:41 To: [EMAIL PROTECTED] Subject: wget bug when I run wget on a certain sites, it tries to download web pages named similar to http://site.com?variable=yesmode=awesome. However, wget isn't saving any of these files, no doubt because of some file naming issue? this problem exists in both the Windows unix versions. hope this helps
WGet Bug: Local URLs containing colons do not work
Hi, I am trying to download a Wiki category for off-line browsing, and am using a command-line like this: wget http://wiki/Category:Fish -r -l 1 -k Wiki categories contain colons in their filenames, for example: Category:Fish If I request that wget convert absolute paths to relative links, then it will create a link like this: a href=Category:Fish title=Category:FishFish/a Unfortunately, this is not a valid URL, because the browser interprets the 'Category:' as the invalid protocol Category, not the local filename 'Category:Fish' You can get wget to replace the : with an escaped character using --restrict-file-names=windows, but unfortunately this does not fix the problem because the browser will un-escape the URL and will still continue to look for a file with a colon in it. I am not sure of the best way to address this bug, because I am not sure if it possible to escape the ':' to prevent the browser from treating it as a delimiter. It might be best to be allowed to specify some other character, such as '_', to be used to replace the ':' in both filename and URL. Regards, Peter Fletcher
WGet Bug: Local URLs containing colons do not work
Hi, I am trying to download a Wiki category for off-line browsing, and am using a command-line like this: wget http://wiki/Category:Fish -r -l 1 -k Wiki categories contain colons in their filenames, for example: Category:Fish If I request that wget convert absolute paths to relative links, then it will create a link like this: a href=Category:Fish title=Category:FishFish/a Unfortunately, this is not a valid URL, because the browser interprets the 'Category:' as the protocol Category, not the local filename 'Category:' I am not sure of the best way to address this bug, because I am not sure if it possible to escape the ':' to prevent the browser from treating it as a delimiter. It might be best to be allowed to specify some character to be used to replace the ':' in both filename and URL. Regards, Peter Fletcher
Re: wget bug in finding files after disconnect
Paul Bickerstaff [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: I'm using wget version GNU Wget 1.10.2 (Red Hat modified) on a fedora core5 x86_64 system (standard wget rpm). I'm also using version 1.10.2b on a WinXP laptop. Both display the same faulty behaviour which I don't believe was present in earlier versions of wget that I've used. When the internet connection disconnects wget automatically tries to redownload the file (starting from where it was disconnected). The problem is that it is consistently failing to find the file. The following output shows what is happening. wget -c ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/nr.*.tar.gz [...] Retrying. --14:13:54-- ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/nr.00.tar.gz (try: 2) = `nr.00.tar.gz' Connecting to bio-mirror.jp.apan.net|150.26.2.58|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD not required. == PASV ... done.== REST 315859600 ... done. == RETR nr.00.tar.gz ... No such file `nr.00.tar.gz'. [...] I have checked and the files are there and have not moved or altered in any way. I believe that the problem is almost certainly associated with the logged item CWD not required after a reconnect. Cheers I encountered the same situation and solved it this way: Call wget with -B (--base) option to set base directory and with -i (--input-file) to point to a file containing the relative URLs you want to download. Not tested, but it should look like this wget -c --base=ftp://bio-mirror.jp.apan.net/pub/biomirror/blast/ --input-file=urls.txt with urls.txt containing nr.*.tar.gz Hope it helps you. Georg
wget bug
well this really isn't a bug per say... but whenever you set -q for no output , it still makes a wget log file on the desktop.
Re: new wget bug when doing incremental backup of very large site
From dev: I checked and the .wgetrc file has continue=on. Is there any way to surpress the sending of getting by byte range? I will read through the email and see if I can gather some more information that may be needed. Remove continue=on from .wgetrc? Consider: -N, --timestampingdon't re-retrieve files unless newer than local. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
new wget bug when doing incremental backup of very large site
I was running wget to test mirroring an internal development site, and using large database dumps (binary format) as part of the content to provide me with a large number of binary files for the test. For the test I wanted to see if wget would run and download a quantity of 500K files with 100GB of total data transferred. The test was going fine and wget ran flawlessly for 3 days downloading almost the entire contents of the test site and I was at 85GB. wget would have run until the very end and would have passed the test downloading all 100GB of the test files. Then a power outage occurred, my local test box was not on battery backup, so I had to restart wget and the test. wget did not refetch the binary backup files and gave (for each file that had already been retrieved the following message: - = `domain/database/dbdump_107899.gz' Connecting to domain|ip|:80... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. - --- wget continued to run for about eight hours, and gave the above message on several thousands files, then crashed giving: wget: realloc: Failed to allocate 536870912 bytes; memory exhausted. This was surprising because wget ran flawlessly on the initial download for several days but on a refresh or incremental backup of the data, wget crashed after eight hours. I believe it has something to do with the code that is run when wget already finds a local file with the same name and sends a range request. Maybe there is some data structure that keeps getting added to so it exhausts the memory on my test box which has 2GB. There were no other programs running on the test box. This may be a bug. To get around this for purposes of my test, I would like to know if there is anyway (any switch) to tell wget to not send any type of range request at all, if the local filename exists but to skip sending any type of request, if it finds a file with the same name. I do not want it to check to see if the file is newer, if the file is complete, just skip it and go on to the next file. I was running wget under cygwin on a Windows XP box. The wget command that I ran was the following: wget -m -l inf --convert-links --page-requisites http://domain I had the following .wgetrc file $HOME/.wgetrc #backup_converted=on page_requisites=on continue=on dirstruct=on #mirror=on #noclobber=on #recursive=on wait=3 http_user=username http_passwd=passwd #convert_links=on verbose=on user_agent=firefox dot_style=binary
new wget bug when doing incremental backup of very large site
I was running wget to test mirroring an internal development site, and using large database dumps (binary format) as part of the content to provide me with a large number of binary files for the test. For the test I wanted to see if wget would run and download a quantity of 500K files with 100GB of total data transferred. The test was going fine and wget ran flawlessly for 3 days downloading almost the entire contents of the test site and I was at 85GB. wget would have run until the very end and would have passed the test downloading all 100GB of the test files. Then a power outage occurred, my local test box was not on battery backup, so I had to restart wget and the test. wget did not refetch the binary backup files and gave (for each file that had already been retrieved the following message: - = `domain/database/dbdump_107899.gz' Connecting to domain|ip|:80... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. - --- wget continued to run for about eight hours, and gave the above message on several thousands files, then crashed giving: wget: realloc: Failed to allocate 536870912 bytes; memory exhausted. This was surprising because wget ran flawlessly on the initial download for several days but on a refresh or incremental backup of the data, wget crashed after eight hours. I believe it has something to do with the code that is run when wget already finds a local file with the same name and sends a range request. Maybe there is some data structure that keeps getting added to so it exhausts the memory on my test box which has 2GB. There were no other programs running on the test box. This may be a bug. To get around this for purposes of my test, I would like to know if there is anyway (any switch) to tell wget to not send any type of range request at all, if the local filename exists but to skip sending any type of request, if it finds a file with the same name. I do not want it to check to see if the file is newer, if the file is complete, just skip it and go on to the next file. I was running wget under cygwin on a Windows XP box. The wget command that I ran was the following: wget -m -l inf --convert-links --page-requisites http://domain I had the following .wgetrc file $HOME/.wgetrc #backup_converted=on page_requisites=on continue=on dirstruct=on #mirror=on #noclobber=on #recursive=on wait=3 http_user=username http_passwd=passwd #convert_links=on verbose=on user_agent=firefox dot_style=binary
new wget bug when doing incremental backup of very large site
I was running wget to test mirroring an internal development site, and using large database dumps (binary format) as part of the content to provide me with a large number of binary files for the test. For the test I wanted to see if wget would run and download a quantity of 500K files with 100GB of total data transferred. The test was going fine and wget ran flawlessly for 3 days downloading almost the entire contents of the test site and I was at 85GB. wget would have run until the very end and would have passed the test downloading all 100GB of the test files. Then a power outage occurred, my local test box was not on battery backup, so I had to restart wget and the test. wget did not refetch the binary backup files and gave (for each file that had already been retrieved the following message: - = `domain/database/dbdump_107899.gz' Connecting to domain|ip|:80... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable The file is already fully retrieved; nothing to do. - --- wget continued to run for about eight hours, and gave the above message on several thousands files, then crashed giving: wget: realloc: Failed to allocate 536870912 bytes; memory exhausted. This was surprising because wget ran flawlessly on the initial download for several days but on a refresh or incremental backup of the data, wget crashed after eight hours. I believe it has something to do with the code that is run when wget already finds a local file with the same name and sends a range request. Maybe there is some data structure that keeps getting added to so it exhausts the memory on my test box which has 2GB. There were no other programs running on the test box. This may be a bug. To get around this for purposes of my test, I would like to know if there is anyway (any switch) to tell wget to not send any type of range request at all, if the local filename exists but to skip sending any type of request, if it finds a file with the same name. I do not want it to check to see if the file is newer, if the file is complete, just skip it and go on to the next file. I was running wget under cygwin on a Windows XP box. The wget command that I ran was the following: wget -m -l inf --convert-links --page-requisites http://domain I had the following .wgetrc file $HOME/.wgetrc #backup_converted=on page_requisites=on continue=on dirstruct=on #mirror=on #noclobber=on #recursive=on wait=3 http_user=username http_passwd=passwd #convert_links=on verbose=on user_agent=firefox dot_style=binary
Re: new wget bug when doing incremental backup of very large site
1. It would help to know the wget version (wget -V). 2. It might help to see some output when you add -d to the wget command line. (One existing file should be enough.) It's not immediately clear whose fault the 416 error is. It might also help to know which Web server is running on the server, and how big the file is which you're trying to re-fetch. This was surprising [...] You're easily surprised. wget: realloc: Failed to allocate 536870912 bytes; memory exhausted. 500MB sounds to me like a lot. [...] it exhausts the memory on my test box which has 2GB. A memory exhausted complaint here probably refers to virtual memory, not physical memory. [...] I do not want it to check to see if the file is newer, if the file is complete, just skip it and go on to the next file. I haven't checked the code, but with continue=on, I'd expect wget to check the size and date together, and not download any real data if the size checks, and the local file date is later. The 416 error suggests that it's trying to do a partial (byte-range) download, and is failing because either it's sending a bad byte range, or the server is misinterpreting a good byte range. Adding -d should show what wget thinks that it's sending. Knowing that and the actual file size might show a problem. If the -d output looks reasonable, the fault may lie with the server, and an actual URL may be needed to persue the diagnosis from there. The memory allocation failure could be a bug, but finding it could be difficult. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
[WGET BUG] - Can not retreive image from cacti
Hello, We are using version 1.10.2 of wget under Ubuntu and Debian. So we have many scripts that get some images from a cacti site. These scripts ran perfectly with version 1.9 of wget but they can not get image with version 1.10.2 of wget. Here you can find an example of our scripts: sub GetCactiGraph() { my ($node,$alt,$time,$filename)[EMAIL PROTECTED]; my $url = https://foo.bar/cacti/;; my $b = WWW::Mechanize-new(); $b-get($url); $b-field(login_username, user); $b-field(login_password, user); $b-click(); if ($b-content() =~ /, gFld\(.*$node, (.+)\)\)/g) { $b-get($url . $1); if ($b-content() =~ /img src='(graph_image\.php\?local_graph_id=\d+).+' border='0' alt='\s*$alt\s*'/g) { my $period = ($time eq day ? rra_id=1 : rra_id=3); print WGET: $url$1$period -O $filename\n; if (defined $filename) { `wget -q $url$1$period -O $filename`; return $filename;} else { `wget --no-check-certificate -q $url$1$period -O $alt.png`; return $alt.png ;} } } } File is created but it is empty. Bye, Thomas
Re: Wget Bug: recursive get from ftp with a port in the url fails
Jesse Cantara [EMAIL PROTECTED] writes: A quick resolution to the problem is to use the -nH command line argument, so that wget doesn't attempt to create that particular directory. It appears as if the problem is with the creation of a directory with a ':' in the name, which I cannot do outside of wget either. I am not sure if that is specific to my filesystem, or to linux in general. It's not specific to Linux, so it must be your file system. Are you perhaps running Wget on a FAT32-mounted partition? If so, try using --restrict-file-names=windows. Thanks for the report.
Wget Bug: recursive get from ftp with a port in the url fails
I've encountered a bug when trying to do a recursive get from an ftp site with a non-standard port defined in the url, such as ftp.somesite.com:1234.An example of the command I am typing is: wget -r ftp://user:[EMAIL PROTECTED]:4321/Directory/*Where Directory contains multiple subdirectories, all of which I wish to get. The output I get from wget is:== SYST ... done. == PWD ... done.== TYPE I ... done. == CWD /Bis ... done.== PASV ... done. == LIST ... done. ftp.somehost.com:4321/Directory: No such file or directoryftp.somehost.com:4321/Directory/.listing: No such file or directoryunlink: No such file or directory And nothing is downloaded, wget stops executing there. A quick resolution to the problem is to use the -nH command line argument, so that wget doesn't attempt to create that particular directory. It appears as if the problem is with the creation of a directory with a ':' in the name, which I cannot do outside of wget either. I am not sure if that is specific to my filesystem, or to linux in general. I am using GNU Wget 1.10.2 in Linux version 2.6.14, Gentoo 3.3.6.Apologies if this is already known, or if I have not provided enough information. I looked for a bug listing, and attempted to get as much information as I can, but I am not a computer scientist or a programmer. Thank you very much for the wonderful program, it has helped me out in many ways, and I hope this helps the developers. -Jesse Cantara
wget bug: doesn't CWD after ftp failure
Hi folks, I think I have found a bug in wget where it fails to change the working directory when retrying a failed ftp transaction. This is wget 1.10.2 on FreeBSD-6.0/amd64. I was trying to use wget to get files from a broken ftp server which occasionally sends garbled responses, causing wget to get confused, eventually timeout, and retry the transfer. (The failure mode which makes it most obvious is sending a response to PASV which lacks the initial numeric response code, so that wget can't recognize it.) This is fine. However, when wget reconnects, it mistakenly thinks it is already in the appropriate directory, and it doesn't change it, reporting CWD not required. This results in it trying to fetch the file from the root directory instead of the correct path. Unfortunately I can't give you access to the server in question. I can sanitize the output of a wget session if you want. However, I think the bug is obvious from inspection. At ftp.c:1197 in ftp_loop_internal() we have err = getftp (u, len, restval, con); if (con-csock != -1) con-st = ~DONE_CWD; else con-st |= DONE_CWD; This test seems clearly to be backwards. If con-csock is -1 (i.e. the connection has been closed) then we must clear the DONE_CWD flag. Otherwise CWD has been done and we can set the flag. Reversing the test fixes the problem. It also causes the CWD optimization to actually work when it's applicable, instead of only when it isn't :) It might be worthwhile at other spots in the code to put in an assert() to ensure that we have (DO_CWD || !DO_LOGIN). Perhaps after those flags are set, e.g. ftp.c:1161 in ftp_loop_internal() and ftp.c:1409 in ftp_retrieve_list(). Also the existence of both DONE_CWD and DO_CWD may cause confusion and could probably cleaned up. Thanks for working on wget! It's a great tool. -- Nate Eldredge [EMAIL PROTECTED]
Re: wget BUG: ftp file retrieval
[EMAIL PROTECTED] (Steven M. Schweda) writes: and adding it fixed many problems with FTP servers that log you in a non-/ working directory. Which of those problems would _not_ be fixed by my two-step CWD for a relative path? That is: [...] That should work too. On Unix-like FTP servers, the two methods would be equivalent. Thanks for the suggestion. I realized your patch contained improvements for dealing with VMS FTP servers, but I somehow managed to miss this explanation.
Re: wget BUG: ftp file retrieval
From: Hrvoje Niksic [...] On Unix-like FTP servers, the two methods would be equivalent. Right. So I resisted temptation, and kept the two-step CWD method in my code for only a VMS FTP server. My hope was that some one would look at the method, say That's a good idea, and change the if to let it be used everywhere. Of course, I'm well known to be delusional in these matters. Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
wget BUG: ftp file retrieval
Hello, current wget seems to have the following bug in the ftp retrieval code: When called like: wget user:[EMAIL PROTECTED]/foo/bar/file.tgz and foo or bar is a read/execute protected directory while file.tgz is user-readable, wget fails to retrieve the file because it tries to CWD into the directory first. I think the correct behaviour should be not to CWD into the directory but to issue a GET request with the full path instead ( which will succeed ). Best regards, Arne Caspari
Re: wget BUG: ftp file retrieval
Arne Caspari [EMAIL PROTECTED] writes: When called like: wget user:[EMAIL PROTECTED]/foo/bar/file.tgz and foo or bar is a read/execute protected directory while file.tgz is user-readable, wget fails to retrieve the file because it tries to CWD into the directory first. I think the correct behaviour should be not to CWD into the directory but to issue a GET request with the full path instead ( which will succeed ). I believe that CWD is mandated by the FTP specification, but you're also right that Wget should try both variants. You can force Wget into getting the file without CWD using this kludge: wget ftp://user:[EMAIL PROTECTED]/%2Ffoo%2Fbar%2Ffile.tgz -O file.tgz
Re: wget BUG: ftp file retrieval
Hrvoje Niksic wrote: Arne Caspari [EMAIL PROTECTED] writes: I believe that CWD is mandated by the FTP specification, but you're also right that Wget should try both variants. i agree. perhaps when retrieving file A/B/F.X we should try to use: GET A/B/F.X first, then: CWD A/B GET F.X if the previous attempt failed, and: CWD A CDW B GET F.X as a last resort. what do you think? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget BUG: ftp file retrieval
Thank you all for your very fast response. As a further note: When this error occurs, wget bails out with the following error message: No such directory foo/bar. I think it should instead be Could not access foo/bar: Permission denied or similar in such a situation. /Arne Mauro Tortonesi wrote: Hrvoje Niksic wrote: Arne Caspari [EMAIL PROTECTED] writes: I believe that CWD is mandated by the FTP specification, but you're also right that Wget should try both variants. i agree. perhaps when retrieving file A/B/F.X we should try to use: GET A/B/F.X first, then: CWD A/B GET F.X if the previous attempt failed, and: CWD A CDW B GET F.X as a last resort. what do you think?
Re: wget BUG: ftp file retrieval
Mauro Tortonesi [EMAIL PROTECTED] writes: Hrvoje Niksic wrote: Arne Caspari [EMAIL PROTECTED] writes: I believe that CWD is mandated by the FTP specification, but you're also right that Wget should try both variants. i agree. perhaps when retrieving file A/B/F.X we should try to use: GET A/B/F.X first, then: CWD A/B GET F.X if the previous attempt failed, and: CWD A CDW B GET F.X as a last resort. what do you think? That might work. Also don't prepend the necessary prepending of $CWD to those paths.
Re: wget BUG: ftp file retrieval
Hrvoje Niksic [EMAIL PROTECTED] writes: That might work. Also don't prepend the necessary prepending of $CWD to those paths. Oops, I meant don't forget to prepend
Re: wget BUG: ftp file retrieval
From: Hrvoje Niksic Also don't [forget to] prepend the necessary [...] $CWD to those paths. Or, better yet, _DO_ forget to prepend the trouble-causing $CWD to those paths. As you might recall from my changes for VMS FTP servers (if you had ever looked at them), this scheme causes no end of trouble. A typical VMS FTP server reports the CWD in VMS form (for example, SYS$SYSDEVICE:[ANONYMOUS]). It may be willing to use a UNIX-like path in a CWD command (for example, CWD A/B, but it's _not_ willing to use a mix of them (for example, SYS$SYSDEVICE:[ANONYMOUS]/A/B). At a minimum, a separate CWD should be used to restore the initial directory. After that, you can do what you wish. On my server at least (HP TCPIP V5.4), GET A/B/F.X will work, but the mixed mess is unlikely to work on any VMS FTP server. Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget BUG: ftp file retrieval
On Fri, 25 Nov 2005, Steven M. Schweda wrote: Or, better yet, _DO_ forget to prepend the trouble-causing $CWD to those paths. I agree. What good would prepending do? It will most definately add problems such as those Steven describes. -- -=- Daniel Stenberg -=- http://daniel.haxx.se -=- ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: wget BUG: ftp file retrieval
From: Hrvoje Niksic Prepending is already there, Yes, it certainly is, which is why I had to disable it in my code for VMS FTP servers. and adding it fixed many problems with FTP servers that log you in a non-/ working directory. Which of those problems would _not_ be fixed by my two-step CWD for a relative path? That is: 1. CWD to the string which the server reported in its initial PWD response. 2. CWD to the relative path in the URL (A/B in our current example). On a VMS server, the first path is probably pure VMS, so it works, and the second path is pure UNIX, so it also works (on all the servers I've tried, at least). As I remark in the (seldom-if-ever-read) comments in my src/ftp.c, I see no reason why this scheme would fail on any reasonable server. But I'm always open to a good argument, especially if it includes a demonstration of a good counter-example. This (in my opinion, stinking-bad) prepending code is the worst part of what makes the current (not-mine) VMS FTP server code so awful. (Running a close second is the part which discards the device name from the initial PWD response, which led to a user complaint in this forum a while back, involving an inability to specify a different device in a URL.) Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
wget bug
Begin forwarded message: From: [EMAIL PROTECTED] Date: October 4, 2005 4:36:09 AM GMT+02:00 To: [EMAIL PROTECTED] Subject: failure notice Hi. This is the qmail-send program at sunsite.dk. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. wget@sunsite.dk: No delivery confirmation received. --- Below this line is a copy of the message. Return-Path: [EMAIL PROTECTED] Received: (qmail 5486 invoked from network); 27 Sep 2005 01:36:08 - Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 27 Sep 2005 01:36:08 - Received: (qmail 70219 invoked from network); 27 Sep 2005 01:36:08 - X-Spam-Checker-Version: SpamAssassin 3.1.0 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-1.7 required=6.0 tests=BAYES_00,UNPARSEABLE_RELAY, URI_NOVOWEL autolearn=no version=3.1.0 X-Spam-Hits: -1.7 Received: from fencepost.gnu.org (199.232.76.164) by a.mx.sunsite.dk with SMTP; 27 Sep 2005 01:36:03 - Received: from monty-python.gnu.org ([199.232.76.173]) by fencepost.gnu.org with esmtp (Exim 4.34) id 1EK4O2-00074C-Et for [EMAIL PROTECTED]; Mon, 26 Sep 2005 21:36:02 -0400 Received: from Debian-exim by monty-python.gnu.org with spam- scanned (Exim 4.34) id 1EK4O1-0002JZ-Ao for [EMAIL PROTECTED]; Mon, 26 Sep 2005 21:36:01 -0400 Received: from [84.153.95.252] (helo=mail.cilly.mine.nu) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1EK4O0-0002J9-Cd for [EMAIL PROTECTED]; Mon, 26 Sep 2005 21:36:01 -0400 Received: from [172.16.17.6] (mercury.cilly.mine.nu [172.16.17.6]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mail.cilly.mine.nu (Postfix) with ESMTP id 1A7511AE81E for [EMAIL PROTECTED]; Tue, 27 Sep 2005 03:35:56 +0200 (CEST) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: [EMAIL PROTECTED] Content-Transfer-Encoding: quoted-printable X-Mailer: Mail Agent 1.0 From: Michael C. Haller [EMAIL PROTECTED] Subject: wget does not encode UTF-8 properly Date: Tue, 27 Sep 2005 03:35:54 +0200 To: [EMAIL PROTECTED] wget does not encode UTF-8 properly wget compiled on Mac OS X Tiger 10.4.2 build 8C46: wget --version GNU Wget 1.10.1 Copyright (C) 2005 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. # --03:24:23-- http://x.dyndns.org/~x/Musik1/Faun/Zauberspru% cc=20= %88che/ =3D `x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3% 88che/= =20 index.html' Resolving x.dyndns.org... 84.130.231.75 Connecting to x.dyndns.org|84.130.231.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%88che: Invalid =20 argumentx.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%88che/=20 index.html: No such file or directory Cannot write to `x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%=20 88che/index.html' (No such file or directory). FINISHED --03:24:29-- Downloaded: 0 bytes in 0 files --03:24:29-- http://x.dyndns.org/~x/Musik1/Apocalyptica/=20 Apocalyptica/ =3D `x.dyndns.org/~x/Musik1/Apocalyptica/=20 Apocalyptica/index.html' Resolving x.dyndns.org... 84.130.231.75 Connecting to x.dyndns.org|84.130.231.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K .--03:26:42-- http://x.dyndns.org/~x/Musik1/Faun/=20 Zauberspru%cc%88che/ =3D `x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3% 88che/= =20 index.html' Resolving x.dyndns.org... 84.130.231.75 Connecting to x.dyndns.org|84.130.231.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%88che: Invalid =20 argumentx.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%88che/=20 index.html: No such file or directory Cannot write to `x.dyndns.org/~x/Musik1/Faun/Zauberspru=C3%=20 88che/index.html' (No such file or directory). FINISHED --03:26:50-- Downloaded: 0 bytes in 0 files --03:26:50-- http://x.dyndns.org/~x/Musik1/Apocalyptica/=20 Apocalyptica/ =3D `x.dyndns.org/~x/Musik1/Apocalyptica/=20 Apocalyptica/index.html' Resolving x.dyndns.org... 84.130.231.75 Connecting to x.dyndns.org|84.130.231.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html]
wget bug report
Sorry for the crosspost, but the wget Web site is a little confusing on the point of where to send bug reports/patches. Just installed wget 1.10 on Friday. Over the weekend, my scripts failed with the following error (once for each wget run): Assertion failed: wget_cookie_jar != NULL, file http.c, line 1723 Abort - core dumped All of my command lines are similar to this: /home/programs/bin/wget -q --no-cache --no-cookies -O /home/programs/etc/alte_se iten/xsr.html 'http://www.enterasys.com/download/download.cgi?lib=XSR' After taking a look at it, i implemented the following change to http.c and tried again. It works for me, but i don't know what other implications my change might have. --- http.c.orig Mon Jun 13 08:04:23 2005 +++ http.c Mon Jun 13 08:06:59 2005 @@ -1715,6 +1715,7 @@ hs-remote_time = resp_header_strdup (resp, Last-Modified); /* Handle (possibly multiple instances of) the Set-Cookie header. */ + if (opt.cookies) { char *pth = NULL; int scpos; Mit freundlichen Grüßen MVV Energie AG Abteilung AI.C Andrew Jones Telefon: +49 621 290-3645 Fax: +49 621 290-2677 E-Mail: [EMAIL PROTECTED] Internet: www.mvv.de MVV Energie · Luisenring 49 · 68159 Mannheim Handelsregister-Nr. HRB 1780 Vorsitzender des Aufsichtsrates: Oberbürgermeister Gerhard Widder Vorstand: Dr. Rudolf Schulten (Vorsitzender) · Dr. Werner Dub · Hans-Jürgen Farrenkopf · Karl-Heinz Trautmann
Re: Wget Bug
Arndt Humpert [EMAIL PROTECTED] writes: wget, win32 rel. crashes with huge files. Thanks for the report. This problem has been fixed in the latest version, available at http://xoomer.virgilio.it/hherold/ .
Wget Bug
Hello, wget, win32 rel. crashes with huge files. regards [EMAIL PROTECTED] ___ Gesendet von Yahoo! Mail - Jetzt mit 250MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de== Command Line wget -m ftp://ftp.freenet.de/pub/filepilot/windows/bildung/wikipedia/ Assert Error while mirroing a big file == see ftp listing: P:\temp\wiki\newftp ftp.freenet.de Connected to ftp-0.freenet.de. 220 ftp.freenet.de FTP server ready. User (ftp-0.freenet.de:(none)): anonymous 331 Password required. Password: 230 Login completed. ftp cd pub 250 Changed working directory to /pub. ftp cd filepilot 250 Changed working directory to /pub/filepilot. ftp cd windows 250 Changed working directory to /pub/filepilot/windows. ftp cd bildung 250 Changed working directory to /pub/filepilot/windows/bildung. ftp cd wikipedia 250 Changed working directory to /pub/filepilot/windows/bildung/wikipedia. ftp dir 200 PORT command ok. 150 Opening data connection. -rw-r--r-- 1 filepilo ftp 61875 Apr 11 13:06 WikiCover.pdf -rw-r--r-- 1 filepilo ftp 344804797 Apr 11 13:20 dbd_76.dbz -rw-r--r-- 1 filepilo ftp425128 Apr 08 13:34 dvdcover_wikipedia.zip -rw-r--r-- 1 filepilo ftp 2752401408 Apr 08 15:30 wp_1_2005.iso -rw-r--r-- 1 filepilo ftp 14407705 Apr 11 13:06 wpcdhtml.zip -rw-r--r-- 1 filepilo ftp 69805003 Apr 11 13:09 wpcdim.zip -rw-r--r-- 1 filepilo ftp 701104128 Apr 11 13:34 wpcdiso.iso -rw-r--r-- 1 filepilo ftp 10758083 Apr 11 13:07 wpcdmath.zip -rw-r--r-- 1 filepilo ftp 121069235 Apr 11 13:12 wpcdxml.zip 226 Transfer complete. ftp: 632 bytes received in 0,03Seconds 19,75Kbytes/sec. ftp bye 221 Goodbye. P:\temp\wiki\new == Version Info P:\temp\wiki\newwget -V GNU Wget 1.9.1 Copyright (C) 2003 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. == Screen Output Error --11:00:40-- ftp://ftp.freenet.de/pub/filepilot/windows/bildung/wikipedia/wp_1_ 2005.iso = `ftp.freenet.de/pub/filepilot/windows/bildung/wikipedia/wp_1_2005. iso' == CWD not required. == PORT ... done.== RETR wp_1_2005.iso ... done. Length: -1,542,565,888 [ = ] -1,542,565,888 122.04K/s Assertion failed: bytes = 0, file retr.c, line 292 abnormal program termination rcv:[EMAIL PROTECTED]
WGET Bug?
Title: WGET Bug? # C:\Grabtest\wget.exe -r --tries=3 http://www.xs4all.nl/~npo/ -o C:/Grabtest/Results/log # --16:23:02-- http://www.xs4all.nl/%7Enpo/ = `www.xs4all.nl/~npo/index.html' Resolving www.xs4all.nl... 194.109.6.92 Connecting to www.xs4all.nl[194.109.6.92]:80... failed: No such file or directory. Retrying. # Is WGET always aspecting a INDEX.HTML al url file for grabbing data from the WWW ? The most URLs we want to grab are not named as index.html but have other names like: http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml http://www.ecb.de/stats/exchange/eurofxref/html/index.en.html http://www.apx.nl/marketresults.html Is this a problem for WGET by the way? Kind regardst, Peter de Nijs DELTA N.V. afdeling Portfolio Analyse 06-45 57 29 17 06-45 57 29 17 === Dit e-mailbericht is slechts bedoeld voor gebruik door de geadresseerde. Dit bericht kan vertrouwelijke informatie bevatten en/of informatie die is beschermd door een beroepsgeheim. Indien u dit bericht ontvangt terwijl dit niet voor u is bestemd, verzoeken wij u vriendelijk ons hierover per omgaande te berichten. Bij voorbaat dank! The information transmitted by e-mail may be privileged or confidential and protected by Law. If you have received it in error, we would appreciate your notifying us immediately. Thank you! ===
Wget bug
OS = Solaris 8 Platform = Sparc Test command = /usr/local/bin/wget -r -t0 -m ftp://root:[EMAIL PROTECTED]/usr/openv/var The directory will count to some sub-direcotry's andfiles to synchronize. Example : # ls -la /usr/openv/total 68462drwxr-xr-x 14 root bin 512 set 1 17:52 .drwxr-xr-x 18 root sys 512 dez 16 17:01 ..drwxr-xr-x 2 root bin 512 set 1 17:52 bindrwxr-xr-x 5 root bin 512 set 1 17:44 dbdrwxr-xr-x 5 root bin 1024 set 1 17:53 javadrwxr-xr-x 4 root bin 1536 set 1 17:52 libdrwxr-xr-x 4 root bin 512 set 1 17:46 mandrwxr-xr-x 3 root bin 512 set 1 17:46 msgdrwxr-xr-x 11 root bin 1024 set 2 12:38 netbackupdrwxr-xr-x 2 root other 512 set 1 14:23 patchdrwxr-xr-x 2 root bin 512 set 1 17:47 sharedrwxr-xr-x 2 root bin 512 set 1 17:47 tmpdrwxr-xr-x 5 root bin 512 set 2 09:48 vardrwxr-xr-x 8 root bin 512 set 1 19:16 volmgr # ls -laR /usr/openv/var/.:total 18drwxr-xr-x 5 root bin 512 set 2 09:48 .drwxr-xr-x 14 root bin 512 set 1 17:52 ..drwxr-xr-x 3 root bin 512 set 1 17:52 auth-rw-r--r-- 1 root root 9 set 2 09:48 authorize.txt-rw-r--r-- 1 root other 2956 dez 18 2002 license.txtdrwx-- 2 root other 512 jan 5 20:56 vnetddrwxr-xr-x 3 root bin 512 set 1 17:52 vxss ./auth:total 42drwxr-xr-x 3 root bin 512 set 1 17:52 .drwxr-xr-x 5 root bin 512 set 2 09:48 ..-rw-r--r-- 1 root bin 921 out 3 2002 methods.txt-rw-r--r-- 1 root bin 1415 set 1 12:11 methods_allow.txt-rw-r--r-- 1 root bin 1599 out 1 2002 methods_deny.txt-rw-r--r-- 1 root bin 1459 out 1 2002 names_allow.txt-rw-r--r-- 1 root bin 1701 out 1 2002 names_deny.txt-r--r--r-- 1 root bin 965 set 1 17:52 template.methods.txt-r--r--r-- 1 root bin 1387 set 1 17:52 template.methods_allow.txt-r--r--r-- 1 root bin 1607 set 1 17:52 template.methods_deny.txt-r--r--r-- 1 root bin 1467 set 1 17:52 template.names_allow.txt-r--r--r-- 1 root bin 1709 set 1 17:52 template.names_deny.txtdrwxr-xr-x 4 root other 512 set 1 12:08 vopie ./auth/vopie:total 8drwxr-xr-x 4 root other 512 set 1 12:08 .drwxr-xr-x 3 root bin 512 set 1 17:52 ..drwx-- 3 root other 512 set 1 12:08 hasheddrwx-- 3 root other 512 set 1 12:08 unhashed Log of command wget: Downloaded: 184 bytes in 1 files--18:02:33-- ftp://root:[EMAIL PROTECTED]/usr/openv/var = `10.1.1.10/usr/openv/.listing'Connecting to 10.1.1.10:21... connected.Logging in as root ... Logged in!== SYST ... done. == PWD ... done.== TYPE I ... done. == CWD /usr/openv ... done.== PORT ... done. == LIST ... done. [ = ] 903 --.--K/s 18:02:34 (192.12 KB/s) - `10.1.1.10/usr/openv/.listing' saved [903] --18:02:34-- ftp://root:[EMAIL PROTECTED]/usr/openv/var = `10.1.1.10/usr/openv/var'== CWD not required.== PORT ... done. == RETR var ... No such file `var'. FINISHED --18:02:34--Downloaded: 903 bytes in 1 files NOTE: The ftp command working fine.
Re: wget bug: spaces in directories mapped to %20
Zitat von Tony O'Hagan [EMAIL PROTECTED]: Original path: abc def/xyz pqr.gif After wget mirroring: abc%20def/xyz pqr.gif (broken link) wget --version is GNU Wget 1.8.2 This was a well-known error in the 1.8 versions of wget, which is already corrected in the 1.9 versions. Regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10 Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
wget bug: spaces in directories mapped to %20
Recently I used the following wget command under a hosted linux account: $ wget -mirror url -o mirror.log The web site contained files and virtual directories that contained spaces in the names. URL encoding translated these spaces to %20. wget correctly URL decoded the file names (creating file names containing spaces) but incorrectly failed to URL decode the directory names (creating directory paths containing %20 instead of spaces). The resulting mirror therefor contained broken links. Some hyper links were embedded inside flash graphics files so hyper link renaming was not an option. Personally, I would never put a space in a web hosted file or directory name but in this case I was migrating a web site that had been developed by someone else. I think that mirroring should work regardless in this case. Example: Original path: abc def/xyz pqr.gif After wget mirroring: abc%20def/xyz pqr.gif (broken link) wget --version is GNU Wget 1.8.2 Thanks for the invaluable wget. Tony O'Hagan. -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.6.13 - Release Date: 16/01/2005
wget bug
It seems that wget uses a signed 32 bit value for the content-length in HTTP. I haven't looked at the code, but it appears that this is what is happening. The problem is that when a file larger than about 2GB is downloaded, wget reports negative numbers for it's size and quits the download right after it starts. I would assume that somewhere there is a loop that looks something like: while( what I've downloaded what I think the size is ) { //do some more downloading. } And after the first read from the stream, the loop fails because whatever you read is indeed bigger than a negative number so it exits. Of course, this is all speculation on my part about what the code looks like but none the less, the bug does exist on both linux and cygwin. Thanks, Matt --- BTW: great job, really... on wget and all the GNU software in general... THANKS
wget bug with large files
I got a crash in wget downloading a large iso file (2,4 GB) newdeal:/pub/isos# wget -c ftp://ftp.belnet.be/linux/fedora/linux/core/3/i386/iso/FC3-i386-DVD.iso --09:22:17-- ftp://ftp.belnet.be/linux/fedora/linux/core/3/i386/iso/FC3-i386-DVD.iso = `FC3-i386-DVD.iso' Resolving ftp.belnet.be... 193.190.198.20 Connecting to ftp.belnet.be[193.190.198.20]:21... connected. Accesso come utente anonymous ... Login eseguito! == SYST ... fatto. == PWD ... fatto. == TYPE I ... fatto. == CWD /linux/fedora/linux/core/3/i386/iso ... fatto. == SIZE FC3-i386-DVD.iso ... fatto. == PASV ... fatto. == REST 2079173504 ... fatto. == RETR FC3-i386-DVD.iso ... fatto. 100%[+=] 2,147,470,560 60.39K/s ETA 00:00wget: progress.c:704: create_image: Assertion `insz = dlsz' failed. Aborted then I tried to resume the download .. newdeal:/pub/isos# wget -c ftp://ftp.belnet.be/linux/fedora/linux/core/3/i386/iso/FC3-i386-DVD.iso --09:41:40-- ftp://ftp.belnet.be/linux/fedora/linux/core/3/i386/iso/FC3-i386-DVD.iso = `FC3-i386-DVD.iso' Resolving ftp.belnet.be... 193.190.198.20 Connecting to ftp.belnet.be[193.190.198.20]:21... connected. Accesso come utente anonymous ... Login eseguito! == SYST ... fatto. == PWD ... fatto. == TYPE I ... fatto. == CWD /linux/fedora/linux/core/3/i386/iso ... fatto. == SIZE FC3-i386-DVD.iso ... fatto. == PASV ... fatto. == REST -2147476576 ... REST fallito, ricomincio dall'inizio. (restarting from beginning) == RETR FC3-i386-DVD.iso ... fatto. [ =] 551,648 63.87K/s Here it deleted the old iso image (2,1GB downloaded) and started from the beginning .. shouldn't it save the new file with a .1 suffix ? Let me know if I can help you tracking this bug Thanks, -- Roberto Sebastiano [EMAIL PROTECTED]
I want to report a wget bug
Hello! I am very pleased to use wget to crawl pages. It is an excellent tool. Recently I find a bug in using wget, although I am not sure wether it's a bug or an incorrect usage. I just to want to report here. When I use wget to mirror or recursively download a web site with -O option, I mean to mirror the whole site's pages in one file. But as I type ./wget -m -O filename http://site;, I can only save the index file of site into file filename. Surprisingly, when I first type ./wget -m http://site;, after successfully download some pages, I stop the crawling process, and this pages will be save to a hierachy the same as the website itself. After that, when I use -O option again for the same web site, the mirror option will then take effect. I will be looking forward to hearing from you , Thanks jiaming [EMAIL PROTECTED] 2004-11-25
wget -- bug / feature request (not sure)
Hello, Probably I am just too lazy, haven't spent enough time to read the man, and wget can actually do exactly what I want. If so -- I do apologize for taking your time. Otherwise: THANKS for your time!..:-). My problem is: redirects. I am trying to catch them by using, say, netcat ... or writing some simple pieces of software -- sending HTTP GET and catching the Location: in response. What I've found out is that (obviously) wget is wa-ay more sophisticated and can do much better job, especially in certain cases. I started using it by basically catching stderr from wget [params my_urls] and then parsing it -- looking for the ^Location: pattern. Works great. The downside is: performance. You see, I don't need the actual content, -- only the canonical URL. But wget just wgets it - no matter what. As long as (from my perspective) this is a case of If Wget does not behave as documented, it's a bug. -- according to man, -- I am taking a liberty to 'file a bug'. (The expected behavior I'm talking about is this: if I use --spider, I expect wget do nothing after finding the server -- like sending GET to the server and getting HTML back). That's my bug - and/or a feature I'd really like to have. An alternative would be: adding --some_flag=n, meaning receive no more than n lines of html). Do you think that this could be a useful feature that other people would probably love too?... Thanks for your time and for a great tool, Vlad.
Re: wget bug with ftp/passive
On Wed, 21 Jan 2004 23:07:30 -0800, you wrote: Hello, I think I've come across a little bug in wget when using it to get a file via ftp. I did not specify the passive option, yet it appears to have been used anyway Here's a short transcript: Passive FTP can be specified in /etc/wgetrc or /usr/local/etc/wgetrc, and then its impossible to turn it off. There is no --active-mode flag as far as I can tell. I submitted a patch to wget-patches under the title of Patch to add --active-ftp and make --passive-ftp default, which does what it says. Your configuration is setting passive mode to default, but the stock wget defaults to active (active mode doesn't work too well behind some firewalls). --active-ftp is a very useful option in these cases. Last I checked, the patch hasn't been committed. I can't find the wget-patches mail archives anywhere, either. So I'll paste it here, in hopes that it helps. -Jeff Connelly =cut here= Common subdirectories: doc.orig/ChangeLog-branches and doc/ChangeLog-branches diff -u doc.orig/wget.pod doc/wget.pod --- doc.orig/wget.pod Wed Jul 21 20:17:29 2004 +++ doc/wget.podWed Jul 21 20:18:56 2004 @@ -888,12 +888,17 @@ system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix Cls output). +=item B--active-ftp + +Use the Iactive FTP retrieval scehme, in which the server +initiates the data connection. This is sometimes required to connect +to FTP servers that are behind firewalls. =item B--passive-ftp Use the Ipassive FTP retrieval scheme, in which the client initiates the data connection. This is sometimes required for FTP -to work behind firewalls. +to work behind firewalls, and as such is enabled by default. =item B--retr-symlinks Common subdirectories: src.orig/.libs and src/.libs Common subdirectories: src.orig/ChangeLog-branches and src/ChangeLog-branches diff -u src.orig/init.c src/init.c --- src.orig/init.c Wed Jul 21 20:17:33 2004 +++ src/init.c Wed Jul 21 20:17:59 2004 @@ -255,6 +255,7 @@ opt.ftp_glob = 1; opt.htmlify = 1; opt.http_keep_alive = 1; + opt.ftp_pasv = 1; opt.use_proxy = 1; tmp = getenv (no_proxy); if (tmp) diff -u src.orig/main.c src/main.c --- src.orig/main.c Wed Jul 21 20:17:33 2004 +++ src/main.c Wed Jul 21 20:17:59 2004 @@ -217,7 +217,8 @@ FTP options:\n\ -nr, --dont-remove-listing don\'t remove `.listing\' files.\n\ -g, --glob=on/off turn file name globbing on or off.\n\ - --passive-ftp use the \passive\ transfer mode.\n\ + --passive-ftp use the \passive\ transfer mode (default).\n\ + --active-ftpuse the \active\ transfer mode.\n\ --retr-symlinks when recursing, get linked-to files (not dirs).\ n\ \n), stdout); fputs (_(\ @@ -285,6 +286,7 @@ { no-parent, no_argument, NULL, 133 }, { non-verbose, no_argument, NULL, 146 }, { passive-ftp, no_argument, NULL, 139 }, +{ active-ftp, no_argument, NULL, 167 }, { page-requisites, no_argument, NULL, 'p' }, { quiet, no_argument, NULL, 'q' }, { random-wait, no_argument, NULL, 165 }, @@ -397,6 +399,9 @@ case 139: setval (passiveftp, on); break; +case 167: + setval (passiveftp, off); + break; case 141: setval (noclobber, on); break;
wget bug: directory overwrite
Hello. Problem: When downloading all in http://udn.epicgames.com/Technical/MyFirstHUD wget overwrites the downloaded MyFirstHUD file with MyFirstHUD directory (which comes later). GNU Wget 1.9.1 wget -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np -U Mozilla $@ Solution: Use of -E option. Regards, Juhana
wget bug report
I sent this message to [EMAIL PROTECTED] as directed in the wget man page, but it bounced and said to try this email address. This bug report is for GNU Wget 1.8.2 tested on both RedHat Linux 7.3 and 9 rpm -q wget wget-1.8.2-9 When I use a wget with the -S to show the http headers, and I use the spider switch as well, it gives me a 501 error on some servers. The main example I have found was doing it against a server running ntop. http://www.ntop.org/ You can find an RPM for it at: http://rpm.pbone.net/index.php3/stat/4/idpl/586625/com/ntop-2.2-0.dag.rh90.i386.rpm.html You cean search with other parameters at rpm.pbone.net to get ntop for other version of linux So here is the command and output: wget -S --spider http://SERVER_WITH_NTOP:3000 HTTP request sent, awaiting response... 1 HTTP/1.0 501 Not Implemented 2 Date: Sat, 27 Mar 2004 07:08:24 GMT 3 Cache-Control: no-cache 4 Expires: 0 5 Connection: close 6 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu) 7 Content-Type: text/html 21:11:56 ERROR 501: Not Implemented. I get a 501 error. echoing the $? shows an exit status of 1 When I don't use the spider, I get the following: wget -S http://SERVER_WITH_NTOP:3000 HTTP request sent, awaiting response... 1 HTTP/1.0 200 OK 2 Date: Sat, 27 Mar 2004 07:09:31 GMT 3 Cache-Control: max-age=3600, must-revalidate, public 4 Connection: close 5 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu) 6 Content-Type: text/html 7 Last-Modified: Mon, 17 Mar 2003 20:27:49 GMT 8 Accept-Ranges: bytes 9 Content-Length: 1214 100%[==] 1,214 1.16M/sETA 00:00 21:13:04 (1.16 MB/s) - `index.html' saved [1214/1214] The exit status was 0 and the index.html file was downloaded. If this is a bug please fix it in your next release of wget. If it is not a bug, I would appriciate a brief explination as to why. Thank You Corey Henderson Chief Programmer GlobalHost.com
wget bug in retrieving large files 2 gig
Hi, While downloading a file of about 3,234,550,172 bytes with wget http://foo/foo.mpg; I get an error: HTTP request sent, awaiting response... 200 OK Length: unspecified [video/mpeg] [ = ] -1,060,417,124 13.10M/s wget: retr.c:292: calc_rate: Assertion `bytes = 0' failed. Aborted The md5sum of downloaded and origanal file is de same! So there should not be an error. The amound of 'bytes downloaded' during is not correct also: It become negative over 2 gig. greetings from the Netherlands, Eduard
Re: wget bug with ftp/passive
don [EMAIL PROTECTED] writes: I did not specify the passive option, yet it appears to have been used anyway Here's a short transcript: [EMAIL PROTECTED] sim390]$ wget ftp://musicm.mcgill.ca/sim390/sim390dm.zip --21:05:21-- ftp://musicm.mcgill.ca/sim390/sim390dm.zip = `sim390dm.zip' Resolving musicm.mcgill.ca... done. Connecting to musicm.mcgill.ca[132.206.120.4]:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /sim390 ... done. == PASV ... Cannot initiate PASV transfer. Are you sure that something else hasn't done it for you? For example, a system-wide initialization file `/usr/local/etc/wgetrc' or `/etc/wgetrc'.
Re: wget bug
Kairos [EMAIL PROTECTED] writes: $ cat wget.exe.stackdump [...] What were you doing with Wget when it crashed? Which version of Wget are you running? Was it compiled for Cygwin or natively for Windows?
wget bug
$ cat wget.exe.stackdump Exception: STATUS_ACCESS_VIOLATION at eip=77F51BAA eax= ebx= ecx=0700 edx=610CFE18 esi=610CFE08 edi= ebp=0022F7C0 esp=0022F74C program=C:\nonspc\cygwin\bin\wget.exe cs=001B ds=0023 es=0023 fs=0038 gs= ss=0023 Stack trace: Frame Function Args 0022F7C0 77F51BAA (000CFE08, 6107C8F1, 610CFE08, ) 0022FBA8 77F7561D (1004D9C0, , 0022FC18, 00423EF8) 0022FBB8 00424ED9 (1004D9C0, 0022FBF0, 0001, 0022FBF0) 0022FC18 00423EF8 (1004A340, 002A, 7865646E, 6D74682E) 0022FD38 0041583B (1004A340, 0022FD7C, 0022FD80, 100662C8) 0022FD98 00420D93 (10066318, 0022FDEC, 0022FDF0, 100662C8) 0022FE18 0041EB7D (10021A80, 0041E460, 610CFE40, 0041C2F4) 0022FEF0 0041C47B (0004, 61600B64, 10020330, 0022FF24) 0022FF40 61005018 (610CFEE0, FFFE, 07E4, 610CFE04) 0022FF90 610052ED (, , 0001, ) 0022FFB0 00426D41 (0041B7D0, 037F0009, 0022FFF0, 77E814C7) 0022FFC0 0040103C (0001, 001D, 7FFDF000, F6213CF0) 0022FFF0 77E814C7 (00401000, , 78746341, 0020) End of stack trace
Wget Bug
Here is debug output :/FTPD# wget ftp://ftp.dcn-asu.ru/pub/windows/update/winxp/xpsp2-1224.exe -d DEBUG output created by Wget 1.8.1 on linux-gnu. --13:25:55-- ftp://ftp.dcn-asu.ru/pub/windows/update/winxp/xpsp2-1224.exe = `xpsp2-1224.exe' Resolving ftp.dcn-asu.ru... done. Caching ftp.dcn-asu.ru = 212.192.20.40 Connecting to ftp.dcn-asu.ru[212.192.20.40]:21... connected. Created socket 3. Releasing 0x8073398 (new refcount 1). Logging in as anonymous ... 220 news FTP server ready. -- USER anonymous 331 Guest login ok, send your complete e-mail address as password. -- PASS -wget@ 530 Login incorrect. Login incorrect. Closing fd 3 Server reply is --- 530- --- 530-Sorry! Too many users are logged in. --- 530-Try letter, please. --- 530- --- 530 Login incorrect. Server reply matched ftp:retry-530, retrying But wget won`t even try to retry :( Can you fix that ?
Re: Wget Bug
The problem is that the server replies with login incorrect, which normally means that authorization has failed and that further retries would be pointless. Other than having a natural language parser built-in, Wget cannot know that the authorization is in fact correct, but that the server happens to be busy. Maybe Wget should have an option to retry even in the case of (what looks like) a login incorrect FTP response.
Re: Wget Bug
Kempston [EMAIL PROTECTED] writes: Yeah, i understabd that, but lftp hadles it fine even without specifying any additional option ;) But then lftp is hammering servers when real unauthorized entry occurs, no? I`m sure you can work something out Well, I'm satisfied with what Wget does now. :-)
Re: dificulty with Debian wget bug 137989 patch
jayme [EMAIL PROTECTED] writes: [...] Before anything else, note that the patch originally written for 1.8.2 will need change for 1.9. The change is not hard to make, but it's still needed. The patch didn't make it to canonical sources because it assumes `long long', which is not available on many platforms that Wget supports. The issue will likely be addressed in 1.10. Having said that: I tried the patch Debian bug report 137989 and didnt work. Can anybody explain: 1 - why I have to make to directories for patch work: one wget-1.8.2.orig and one wget-1.8.2 ? You don't. Just enter Wget's source and type `patch -p1 patchfile'. `-p1' makes sure that the top-level directories, such as wget-1.8.2.orig and wget-1.8.2 are stripped when finding files to patch. 2 - why after compilation the wget still cant download the file 2GB ? I suspect you've tried to apply the patch to Wget 1.9-beta, which doesn't work, as explained above.
dificulty with Debian wget bug 137989 patch
I tried the patch Debian bug report 137989 and didnt work. Can anybody explain: 1 - why I have to make to directories for patch work: one wget-1.8.2.orig and one wget-1.8.2 ? 2 - why after compilation the wget still cant download the file 2GB ? note : I cut the patch for debian use ( the first diff ) Thank you Jayme [EMAIL PROTECTED]
wget bug
It's probably a bug: bug: when downloading wget -mirror ftp://somehost.org/somepath/3acv14~anivcd.mpg, wget saves it as-is, but when downloading wget ftp://somehost.org/somepath/3*, wget saves the files as 3acv14%7Eanivcd.mpg -- The human knowledge belongs to the world
Re: wget bug
Hi Jack :) * Jack Pavlovsky [EMAIL PROTECTED] dixit: It's probably a bug: bug: when downloading wget -mirror ftp://somehost.org/somepath/3acv14~anivcd.mpg, wget saves it as-is, but when downloading wget ftp://somehost.org/somepath/3*, wget saves the files as 3acv14%7Eanivcd.mpg Yes, it *was* a bug. The lastest prerelease has it fixed. Don't know if the tarball has the latest patches, ask Hvroje. But if you are not in a hurry, just wait for 1.9 to be released. The human knowledge belongs to the world True ;)) Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net http://raul.pleyades.net/
Re: wget bug
Jack Pavlovsky [EMAIL PROTECTED] writes: It's probably a bug: bug: when downloading wget -mirror ftp://somehost.org/somepath/3acv14~anivcd.mpg, wget saves it as-is, but when downloading wget ftp://somehost.org/somepath/3*, wget saves the files as 3acv14%7Eanivcd.mpg Thanks for the report. The problem here is that Wget tries to be helpful by encoding unsafe characters in file names to %XX, as is done in URLs. Your first example works because of an oversight (!) that actually made Wget behave as you expected. The good news is that the helpfulness has been rethought for the next release and is no longer there, at least not for ordinary characters like ~ and . Try getting the latest CVS sources, they should work better in this regard. (http://wget.sunsite.dk/ explains how to download the source from CVS.)
wget bug
Dear Sir: I tried to use "wget" download data from ftp site but got error message as following: > wget ftp://ftp.ngdc.noaa.gov/pub/incoming/RGON/anc_1m.OCT Screen show: -- --09:02:40-- ftp://ftp.ngdc.noaa.gov/pub/incoming/RGON/anc_1m.OCT => `anc_1m.OCT' Resolving ftp.ngdc.noaa.gov... done. Connecting to ftp.ngdc.noaa.gov[140.172.180.164]:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /pub/incoming/RGON ... done. ==> PORT ... done. ==> RETR anc_1m.OCT ... Error in server response, closing control connection. Retrying. --- But when I use ftp ( ftp ftp.ngdc.noaa.gov), I can get data. My computer is linux system version : 2.4.18-10smp #smp i686 unknow wget verion : GNU wget 1.8.1 I have a script file use "wget" to get data files automatic every month,. when my computer was linux version (6.2), "wget" did work well. Since I update linux version (7.4), "wget" didn't work as above . Thank you for your help. -- == Jing Ping Ye Email: [EMAIL PROTECTED] Phone: 303 497 3713 National Geophysical Data Center CIRES, University of Colorado, Boulder, CO 80309 ==
wget bug (?): --page-requisites should supercede robots.txt
Using wget 1.8.2: $ wget --page-requisites http://news.com.com ...fails to retrieve most of the files that are required to properly render the HTML document, because they are forbidden by http://news.com.com/robots.txt . I think that use of --page-requisites implies that wget is being used as a save this entire web page as... utility for later human viewing, rather than a text indexing spider that wants to analyze the content but not the presentation. So I believe that wget should ignore robots.txt when --page-requisites is specified. If you agree then I'll try to write a patch send it to you this week... please let me know if you agree or disagree. Thanks! --- the gory bits: wget -d --page-requisites http://news.com.com; says: appending http://news.com.com/i/hdrs/ne/y_fd.gif; to urlpos. etc., but then later says: Deciding whether to enqueue http://news.com.com/i/hdrs/ne/y_fd.gif;. Rejecting path i/hdrs/ne/y_fd.gif because of rule `i/'. Not following http://news.com.com/i/hdrs/ne/y_fd.gif because robots.txt forbids it. Decided NOT to load it.
Wget Bug: Re: not downloading everything with --mirror
Funk Gabor wrote: HTTP does not provide a dirlist command, so wget parses html to find other files it should download. Note: HTML not XML. I suspect that is the problem. If wget wouldn't download the rest, I'd say that too. But 1st the dir gets created, the xml is dloaded (in some other directory some *.gif too) so wget senses the directory. If I issue the wget -m site/dir then all of the rest comes down, (index.html?D=A and others too) so wget is able to get everything but not at once. So there would be no technical limitation for wget to make it happen in one step. So it is either a missing feature (shall I say, a bug as wget can't do the mirror which it could've) or I was unable to find some switch which makes it happen at once. Hmm, now I see. The vast majority of websites are configured to deny directory viewing. That is probably why wget doesn't bother to try, except for the directory specified as the root of the download. I don't think there is any option to do this for all directories, because its not really needed. The _real_ bug is that wget is failing to parse what look like valid img ... src=... ... tags. Perhaps someone more familiar with wget's html parsing code could investigate? The command is: wget -r -l0 www.jeannette.hu/saj.htm and ignored files are a number of image files. Max.
Wget bug: 32 bit int for bytes downloaded.
It seems wget uses a 32 bit integer for the bytes downloaded: [...] FINISHED --17:11:26-- Downloaded: 1,047,520,341 bytes in 5830 files cave /home/suse8.0# du -s 5230588 . cave /home/suse8.0# As it's a once per download variable I'd say it's not that performance critical... Roger.
WGET BUG
Hi, i have a problem and would really like you to help me. i`m using wget for downloading list of file urlsvia http proxy. When proxy server goes offline - wget doesn`t retry downloading of files. Can you fix that or can you tell me how can i fix that ?
WGET BUG
Like That Connecting to 195.108.41.140:3128... failed: Connection refused. --01:19:23-- ftp://kempston:*password*@194.151.106.227:15003/Dragon = `dragon.001' Connecting to 195.108.41.140:3128... failed: Connection refused. --01:19:23-- ftp://kempston:*password*@194.151.106.227:15003/Dragon = `dragon.002 Connecting to 195.108.41.140:3128... failed: Connection refused. --01:19:23-- ftp://kempston:*password*@194.151.106.227:15003/Dragon = `dragon.003 Connecting to 195.108.41.140:3128... failed: Connection refused. --01:19:23-- ftp://kempston:*password*@194.151.106.227:15003/Dragon = `dragon.004 Connecting to 195.108.41.140:3128... failed: Connection refused. FINISHED --01:19:23-- Downloaded: 150,000,000 bytes in 10 files - Original Message - From: Kempston To: [EMAIL PROTECTED] Sent: Monday, July 08, 2002 12:50 AM Subject: WGET BUG Hi, i have a problem and would really like you to help me. i`m using wget for downloading list of file urlsvia http proxy. When proxy server goes offline - wget doesn`t retry downloading of files. Can you fix that or can you tell me how can i fix that ?
Re: wget bug (overflow)
I'm afraid that downloading files larger than 2G is not supported by Wget at the moment.
wget bug (overflow)
fbsd1 --- http wget eshop.tar (3.3G) --- fbsd2 command was: # wget http://kamenica/eshop.tar at the second G i got the following: 2097050K .. .. .. .. .. 431.03 KB/s 2097100K .. .. .. .. ..8.14 MB/s 2097150K .. .. .. .. ..3.76 MB/s -2097104K .. .. .. .. .. 12.21 MB/s -2097054K .. .. .. .. ..8.14 MB/s ... so i did nothing, seeing that everything continues normally. but at the end i got: -684104K .. .. .. .. ..1.74 MB/s -684054K 0.00 B/s assertion bytes = 0 failed: file retr.c, line 254 Abort trap (core dumped) # wget -V GNU Wget 1.8.1 # uname -a FreeBSD vihren.etrade.xx 4.5-STABLE FreeBSD 4.5-STABLE #0: Sat Feb 23 16:54:34 EET 2002 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/VIHREN i386 im not sending u the wget.core because the problem is obvious, according to me. i can repeat that hack if u want that wget.core or some more debbuging info. file is 3594496000 bytez and was copied successfully: kamenica:~# md5 eshop.tar MD5 (eshop.tar) = f1709dcad40073b8c8624a8e100d7697 vihren:~# md5 eshop.tar MD5 (eshop.tar) = f1709dcad40073b8c8624a8e100d7697
Re: wget bug?!
On Monday 18 February 2002 17:52, you wrote: That would be great. The prob is that I'm using it to retrieve files mostly on servers that are having too much users. No I don't want to hammer the server but I do want to keep on trying with reasonable intervals until I get the file. I think the feature would be usuable in other scenarios as well. You now have --waitretry and --wait, in my personal opinion the best would perhaps be to add --waitint(er)(val) or perhaps just --int(er)(val) Anyways, thanks for the reply. Kind regards, Ferry van Steen [The message I'm replying to was sent to [EMAIL PROTECTED]. I'm continuing the thread on [EMAIL PROTECTED] as there is no bug and I'm turning it into a discussion about features.] On 18 Feb 2002 at 15:14, TD - Sales International Holland B.V. wrote: I've tried -w 30 --waitretry=30 --wait=30 (I think this one is for multiple files and the time in between those though) None of these seem to make wget wanna wait for 30 secs before trying again. Like this I'm hammering the server. The --waitretry option will wait for 1 second for the first retry, then 2 seconds, 3 seconds, etc. up to the value specified. So you may consider the first few retry attempts to be hammering the server but it will gradually back off. It sounds like you want an option to specify the initial retry interval (currently fixed at 1 second), but Wget currently has no such option, nor an option to change the amount it increments by for each retry attempt (also currently fixed at 1 second). If such features were to be added, perhaps it could work something like this: --waitretry=n - same as --waitretry=n,1,1 --waitretry=n,m - same as --waitretry=n,m,1 --waitretry=n,m,i - wait m seconds for the first retry, incrementing by i seconds for subsequent retries up to a maximum of n seconds The disadvantage of doing it that way is that no-one will remember which order the numbers should appear, so an alternative is to leave --waitretry alone and supplement it with --waitretryfirst and --waitretryincr options.
Re: wget bug?!
[The message I'm replying to was sent to [EMAIL PROTECTED]. I'm continuing the thread on [EMAIL PROTECTED] as there is no bug and I'm turning it into a discussion about features.] On 18 Feb 2002 at 15:14, TD - Sales International Holland B.V. wrote: I've tried -w 30 --waitretry=30 --wait=30 (I think this one is for multiple files and the time in between those though) None of these seem to make wget wanna wait for 30 secs before trying again. Like this I'm hammering the server. The --waitretry option will wait for 1 second for the first retry, then 2 seconds, 3 seconds, etc. up to the value specified. So you may consider the first few retry attempts to be hammering the server but it will gradually back off. It sounds like you want an option to specify the initial retry interval (currently fixed at 1 second), but Wget currently has no such option, nor an option to change the amount it increments by for each retry attempt (also currently fixed at 1 second). If such features were to be added, perhaps it could work something like this: --waitretry=n - same as --waitretry=n,1,1 --waitretry=n,m - same as --waitretry=n,m,1 --waitretry=n,m,i - wait m seconds for the first retry, incrementing by i seconds for subsequent retries up to a maximum of n seconds The disadvantage of doing it that way is that no-one will remember which order the numbers should appear, so an alternative is to leave --waitretry alone and supplement it with --waitretryfirst and --waitretryincr options.
Re: [Wget]: Bug submission
[ Please mail bug reports to [EMAIL PROTECTED], not to me directly. ] Nuno Ponte [EMAIL PROTECTED] writes: I get a segmentation fault when invoking: wget -r http://java.sun.com/docs/books/performance/1st_edition/html/JPTOC.fm.html My Wget version is 1.7-3, the one which is bundled with RedHat 7.2. I attached my .wgetrc. Wget 1.7 is fairly old -- it was followed by a bugfix 1.7.1 release, and then 1.8 and 1.8.1. Please try upgrading to the latest version, 1.8.1, and see if the bug repeats. I couldn't repeat it with 1.8.1.
wget bug
HI, When I try to send a page to Nextel mobileusing the following command from unix box, "wget http://www.nextel.com/cgi-bin/sendPage.cgi?to01=4157160856%26message=hellothere%26action=send" The wget returns the following message but the page is not reaching the phone. "--15:59:16-- http://www.nextel.com:80/cgi-bin/sendPage.cgi?to01=4157160856mess age=hellothereaction=send = `sendPage.cgi?to01=4157160856message=hellothereaction=send' Location: http://messaging.nextel.com/cgi/mPageExt.dll?buildIndAddressPageentry =1 [following] --15:59:16-- http://messaging.nextel.com:80/cgi/mPageExt.dll?buildIndAddressPag eentry=1 = `mPageExt.dll?buildIndAddressPageentry=1.14' Length: unspecified [text/html] 0K - . 15:59:16 (75.02 KB/s) - `mPageExt.dll?buildIndAddressPageentry=1.14' saved [998 6] But when I send page from Nextel.com web site, it reaches my cell phone. I thought you would help me out. Highly would be appreciated your valuable help Thanks, MuthuGet your FREE download of MSN Explorer at http://explorer.msn.com
wget bug
Dear sir. When I out to my browser (NN'3) line http://find.infoart.ru/cgi-bin/yhs.pl?hidden=http%3A%2F%2F194.67.26.82word=FreeBSD wget working correctly. When I put this line to wget, wget change this line; argument hidden is http:/194.67.26.82word, argument word is empty. Where I am wrong?
Re: maybe wget bug
Hack Kampbjørn [EMAIL PROTECTED] writes: You have hit one of Wget features, it is overzealous in converting URLs into canonical form. As you have discovered Wget first converts all encoded characters back to their real value and then encodes all those that are unsafe sending in URLs. It's a bug. The correct solution has been proposed by Anon Sricharoenchai and I've implemented the function, but it will take some time to integrate it into Wget.
maybe wget bug
Hello, I am using wget to invoke a CGI script call, while passing it several variables. For example: wget -O myfile.txt "http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blueSHAPE=circle" where myscript.cgi say, makes an image based on the parameters "COLOR" and "SHAPE". The problem I am having is when I need to pass a key/value pair where the value contains the "" character. Such as: wget -O myfile.txt "http://user:[EMAIL PROTECTED]/myscript.cgi?COLOR=blue redSHAPE=circle" I have tried encoding the "" as %26, but that does not seem to work (spaces as %20 works fine). The error log for the web server shows that the URL requested does not say %26, but rather "". It does not appear to me that wget is sending the %26 as %26, but perhaps "fixing" it to "". I am using GNU wget v1.5.3 with Red Hat 7.0 Thanks! -- David Christopher Asher
wget bug - after closing control connection
Hello, I've found a (less important) bug in wget. I've been dowloading a file from FTP server and the control connection of the FTP service was closed by the server. After that wget started to print incorrectly progress information (beyond 100%). The log follows: _ # wget -nd ftp://ftp.suse.com/pub/suse/i386/update/7.0/n1/mod_php.rpm --12:30:48-- ftp://ftp.suse.com:21/pub/suse/i386/update/7.0/n1/mod_php.rpm = `mod_php.rpm' Connecting to ftp.suse.com:21... connected! Logging in as anonymous ... Logged in! == TYPE I ... done. == CWD pub/suse/i386/update/7.0/n1 ... done. == PORT ... done.== RETR mod_php.rpm ... done. Length: 1,599,213 (unauthoritative) 0K - .. .. .. .. .. [ 3%] 50K - .. .. .. .. .. [ 6%] 100K - .. .. .. .. .. [ 9%] 150K - .. .. .. .. .. [ 12%] 200K - .. .. .. .. .. [ 16%] 250K - .. .. .. .. .. [ 19%] 300K - .. .. .. .. .. [ 22%] 350K - .. .. .. .. .. [ 25%] 400K - .. .. .. .. .. [ 28%] 450K - .. .. .. .. .. [ 32%] 500K - .. .. .. .. .. [ 35%] 550K - .. .. .[ 36%] 12:41:36 (916.90 B/s) - Control connection closed. Retrying. --12:50:38-- ftp://ftp.suse.com:21/pub/suse/i386/update/7.0/n1/mod_php.rpm (try: 3) = `mod_php.rpm' Connecting to ftp.suse.com:21... connected! Logging in as anonymous ... Logged in! == TYPE I ... done. == CWD pub/suse/i386/update/7.0/n1 ... done. == PORT ... done.== REST 626688 ... done. == RETR mod_php.rpm ... done. Length: 972,525 [345,837 to go] (unauthoritative) [ skipping 600K ] 600K - ,, ,, .. .. .. [ 68%] 650K - .. .. .. .. .. [ 72%] 12:57:59 (187.36 B/s) - Control connection closed. Retrying. --12:57:59-- ftp://ftp.suse.com:21/pub/suse/i386/update/7.0/n1/mod_php.rpm (try: 4) = `mod_php.rpm' Connecting to ftp.suse.com:21... connected! Logging in as anonymous ... Logged in! == TYPE I ... done. == CWD pub/suse/i386/update/7.0/n1 ... done. == PORT ... done.== REST 708608 ... done. == RETR mod_php.rpm ... done. Length: 890,605 [181,997 to go] (unauthoritative) [ skipping 650K ] 650K - ,, ,, ,, ,, ,, [ 80%] 700K - .. .. .. .. .. [ 86%] 750K - .. .. .. .. .. [ 91%] 800K - .. .. .. .. .. [ 97%] 850K - .. .. .. .. .. [103%] 900K - .. .. .. .. .. [109%] 950K - .. .. .. .. .. [114%] 1000K - .. .. .. .. .. [120%] 1050K - .. .. .. .. .. [126%] 1100K - .. .. .. .. .. [132%] 1150K - .. .. .. .. .. [137%] -- ("`-''-/").___..--''"`-._ Cezary Sobaniec `6_ 6 ) `-. ( ).`-.__.') Institute of Computing Science (_Y_.)' ._ ) `._ `. ``-..-' Poznan University of Technology _..`--'_..-_/ /--'_.' ,' [EMAIL PROTECTED] (il).-'' (li).' ((!.-' tel. (+48 61) 665-28-09
Re: wget bug - after closing control connection
Which version of wget do you use ? Are you aware that wget 1.6 has been released and 1.7 is in development (and they contain a workaround for the "Lying FTP server syndrome" you are seeing) ? -- Csaba Rduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED] http://www.sophos.com US support: +1 888 SOPHOS 9UK Support: +44 1235 559933