RE: Wget patches for .files
Mauro Tortonesi wrote: this is a very interesting point, but the patch you mentioned above uses the LIST -a FTP command, which AFAIK is not supported by all FTP servers. As I recall, that's why the patch was not accepted. However, it would be useful if there were some command line option to affect the LIST parameters. Perhaps something like: wget ftp://ftp.somesite.com --ftp-list=-a Tony
Re: Wget should like the file
Behdad Esfahbod [EMAIL PROTECTED] writes: It happened to me to unintentionally run two commands: wget -b -c http://some/file.tar.gz and hours later I figured out that the 1GB that I've downloaded is useless since two wget processes have been downloading the same data twice and appending to the same file. :( So, wget should lock the file for writing, that seems like it doesn't. Thanks for the report. I believe this problem is fixed in Wget 1.10, where the second Wget process would write to file.tar.gz.1, using O_EXCL to make sure that two processes are not clobbering the same file.
Re: wget 1.9.1 worked for 4.2G wrapped file
Linda Walsh [EMAIL PROTECTED] writes: I noticed after my post in the archives that this bug is fixed in 1.10. Now if I can just get the server-ops to fix their CVS server, that'd be great -- I've checked out CVS projects from other sites and not had inbound TCP attempts to some 'auth' service. ;-/:-) Note that use of CVS (in fact svn) is not required -- simply get wget-1.10.tar.gz from the nearest GNU mirror and compile that.
Re: wget multiple downloads
On Wednesday 03 August 2005 08:14 am, dan1 wrote: Hello. I am using wget since a long time now. I like it very much. However I have 2 requests of enhancements that I think to be important and very useful: 1. There should be a 'download acceleration' mode that triggers several downloads at the same time for the same file. This accelerates a lot the download for sites where the bandwidth is limited per connection (not in purpose), because of the routers in between. e.g. 'accel' program does this job very well. I am using that one because of this feature lack in wget. last month hrvoje and i have discussed about whether to implement this feature and we agreed that it would be overkill for a (supposedly) simple command line tool like wget. 2. wget should be completely HTTP1.1 compliant. Now it just uses the HTTP1.0 compatibility of the protocol, but the HTTP1.1 is important to be followed. Once I have been annoyed because of that, I don't remember why now. wget 2.0 will be completely HTTP 1.1 compliant. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10, issues with IPv6 on AIX 5.1
Thanks for the report. The problem seems to come from Wget's use of AI_ADDRCONFIG hint to getaddrinfo. Wget 1.10.1 will not use that hint.
RE: wget a file with long path on Windows XP
PoWah Wong wrote: The login page is: http://safari.informit.com/?FPI=uicode= How to figure out the login command? These two commands do not work: wget --save-cookies cookies.txt http://safari.informit.com/?FPI= [snip] wget --save-cookies cookies.txt http://safari.informit.com/?FPI=uicode=/login.php? [snip] When trying to recreate a form in wget, you have to send the data the server is expecting to receive to the location the server is expecting to receive it. You have to look at the login page for the login form and recreate it. In your browser, view the source to http://safari.informit.com/?FPI=uicode= and you will find the form that appears below. Note that I stripped out formatting information for the table that contains the form and reformatted what was left to make it readable. form action=JVXSL.asp method=post input type=hidden name=s value=1 input type=hidden name=o value=1 input type=hidden name=b value=1 input type=hidden name=t value=1 input type=hidden name=f value=1 input type=hidden name=c value=1 input type=hidden name=u value=1 input type=hidden name=r value= input type=hidden name=l value=1 input type=hidden name=g value= input type=hidden name=n value=1 input type=hidden name=d value=1 input type=hidden name=a value=0 input tabindex=1 name=usr id=usr type=text value= size=12 input name=pwd id=pwd tabindex=1 type=password value= size=12 input type=checkbox tabindex=1 name=savepwd id=savepwd value=1 input type=image name=Login src=images/btn_login.gif alt=Login width=40 height=16 border=0 tabindex=1 align=absmiddle /form Note that the server expects the data to be posted to JVXSL.asp and that there are a bunch of fields that must be supplied in order for the server to process the login request. In addition, the two fields you supply are called usr and pwd. So your first wget command line will look something like this: wget --save-cookies cookies.txt http://safari.informit.com/JVXSL.asp; --post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0usr=wong_powa [EMAIL PROTECTED]pwd=123savepwd=1 Hope that helps! Tony
RE: wget a file with long path on Windows XP
I can save cookies but still has wgetting a blank web page. The web page url is copied from the url displayed in the web browser. These are the logs. C:\Program Files\wget\wget --save-cookies cookies.txt http://safari.informit.com/JVXSL.asp; --post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0[EMAIL PROTECTED]pwd=123savepwd=1 --21:41:38-- http://safari.informit.com/JVXSL.asp = `JVXSL.asp' Resolving safari.informit.com... 193.194.158.208 Connecting to safari.informit.com|193.194.158.208|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 12,149 (12K) [text/html] 100%[] 12,14948.76K/s 21:41:39 (48.63 KB/s) - `JVXSL.asp' saved [12149/12149] First wget: C:\Program Files\wget\wget --load-cookies cookies.txt http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network.ciscoioss=1b=1f=1t =1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6/ciscockbk-CHP-1 --21:43:31-- http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview =g=catid=itbooks.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367 -6/ciscockbk-CHP-1 = [EMAIL PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network .ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP-1' Resolving safari.informit.com... 193.194.158.208 Connecting to safari.informit.com|193.194.158.208|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 21:43:32 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks .network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP -1' saved [0/0] Second wget: C:\Program Files\wget\wget --load-cookies cookies.txt --keep-session-cookies http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network .ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6/ciscockbk-CHP-1 --21:45:31-- http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview =g=catid=itbooks.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367 -6/ciscockbk-CHP-1 = [EMAIL PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network .ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP-1' Resolving safari.informit.com... 193.194.158.208 Connecting to safari.informit.com|193.194.158.208|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 21:45:31 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks .network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP -1' saved [0/0] I am not subscribed, please cc'd in replies to my post. Thanks. --- Tony Lewis [EMAIL PROTECTED] wrote: PoWah Wong wrote: The login page is: http://safari.informit.com/?FPI=uicode= How to figure out the login command? [snip] So your first wget command line will look something like this: wget --save-cookies cookies.txt http://safari.informit.com/JVXSL.asp; --post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0usr=wong_powa [EMAIL PROTECTED]pwd=123savepwd=1 Hope that helps! Tony ___ Post your free ad now! http://personals.yahoo.ca
Re: wget a file with long path on Windows XP
This sounds like a difficult page to download because they may be using cookies or session variables. I'm not sure the best way to proceed, but I would look at the wget documentation about cookies. I think you may have to save the cookies that are generated by the login page and use --load-cookie to get the page you are after. By the way, if you are only after a single page, why not just save it using the browser? Frank PoWah Wong wrote: The website is actually www.informit.com. It require logging in at https://secure.safaribooksonline.com/promo.asp?code=ITT03portal=informita=0 After logging in, then the website becomes similar to booksonline.com which I edit slightly. My public library's electronic access which also require logging in. --- Frank McCown [EMAIL PROTECTED] wrote: Putting quotes around the url got rid of your Invalid parameter errors. I just tried accessing the url you are trying to wget and received an http 500 response. I also tried accessing http://proquest.booksonline.com/ and never got a response. According to your output, wget got back a 0 length response. I would check your web server and make sure it is working properly. Frank PoWah Wong wrote: I put quotes around the url, but it still does not work. C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:45:26-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:45:27 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:46:59-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:46:59 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1%26mode=section% 26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1 %26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 --22:47:45-- http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder= asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r =%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:47:46 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] --- Frank McCown [EMAIL PROTECTED] wrote: I think you need to put quotes around the url. PoWah Wong wrote: The file I want to get is http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0; I opened an MSDOS console on Windows XP. I tried: C:\Program Files\wget\wget.exe
Re: wget a file with long path on Windows XP
Putting quotes around the url got rid of your Invalid parameter errors. I just tried accessing the url you are trying to wget and received an http 500 response. I also tried accessing http://proquest.booksonline.com/ and never got a response. According to your output, wget got back a 0 length response. I would check your web server and make sure it is working properly. Frank PoWah Wong wrote: I put quotes around the url, but it still does not work. C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:45:26-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:45:27 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:46:59-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:46:59 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1%26mode=section% 26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1 %26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 --22:47:45-- http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder= asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r =%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:47:46 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] --- Frank McCown [EMAIL PROTECTED] wrote: I think you need to put quotes around the url. PoWah Wong wrote: The file I want to get is http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0; I opened an MSDOS console on Windows XP. I tried: C:\Program Files\wget\wget.exe http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcod e=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 --05:33:34-- http://proquest.booksonline.com/JVXSL.asp?x=1 = [EMAIL PROTECTED]' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 05:33:34 (0.00 B/s) - [EMAIL PROTECTED]' saved [0/0] Invalid parameter - =section 'sortKey' is not recognized as an internal or external command, operable program or batch file. 'sortOrder' is not recognized as an internal or external command, operable program or batch file. 'view' is not recognized as an internal or external command, operable program or batch
Re: wget a file with long path on Windows XP
I put quotes around the url, but it still does not work. C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:45:26-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:45:27 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1mode=sectionso rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r =o=1n=1d=1p=1a=0page=0 --22:46:59-- http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:46:59 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] C:\bookC:\Program Files\wget\wget.exe http://proquest.booksonline.com/?x=1%26mode=section% 26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1 %26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 --22:47:45-- http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder= asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r =%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0 = [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 22:47:46 (0.00 B/s) - [EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160 76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0' saved [0/0] --- Frank McCown [EMAIL PROTECTED] wrote: I think you need to put quotes around the url. PoWah Wong wrote: The file I want to get is http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0; I opened an MSDOS console on Windows XP. I tried: C:\Program Files\wget\wget.exe http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcod e=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0 --05:33:34-- http://proquest.booksonline.com/JVXSL.asp?x=1 = [EMAIL PROTECTED]' Resolving proquest.booksonline.com... 193.194.158.201 Connecting to proquest.booksonline.com|193.194.158.201|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] [ = ] 0 --.--K/s 05:33:34 (0.00 B/s) - [EMAIL PROTECTED]' saved [0/0] Invalid parameter - =section 'sortKey' is not recognized as an internal or external command, operable program or batch file. 'sortOrder' is not recognized as an internal or external command, operable program or batch file. 'view' is not recognized as an internal or external command, operable program or batch file. 'xmlid' is not recognized as an internal or external command, operable program or batch file. 'g' is not recognized as an internal or external command, operable program or batch file. 'srchText' is not recognized as an internal or external command, operable program or batch file. 'code' is not recognized as an internal or external
RE: wget 1.10.1 beta 1
Windows MSVC test binary at http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 06, 2005 11:07 PM To: wget@sunsite.dk Subject: wget 1.10.1 beta 1 dear friends, i have just released the first beta of wget 1.10.1: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10.1-beta1.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10.1-beta1.tar.bz2 you are encouraged to download the tarballs, test if the code works properly and report any bug you find. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget and ASCII mode
from Hrvoje Niksic: [...] Unfortunately EOL conversions break automatic downloads resumption (REST in FTP), Could be true. manual resumption (wget -c), Could be true. (I never use wget -c.) break timestamping, How so? and probably would break checksums if we added them. You don't have them, and anyone who would be surprised by this should be directed to the note in the documentation which would explain why. Most Wget's users seem to want byte-by-byte copies, because I don't remember a single bug report about the lack of ASCII conversions. You mean other than the one from the fellow who started this thread? The one thing that is surely wrong about my approach is the ';type=a' option, which should either be removed or come with a big fat warning that it *doesn't* implement the required conversion to native EOL convention and that it's provided for the sake of people who need text transfers and are willing to invoke dos2unix/unix2dos (or their OS equivalent) themselves. Interesting. I'd have made ;type=a work right (which I claim to have done), and then perhaps included a run-time error or documentation warning if it were mixed with incompatible options (which I haven't done). Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget and ASCII mode
[EMAIL PROTECTED] (Steven M. Schweda) writes: from Hrvoje Niksic: [...] Unfortunately EOL conversions break automatic downloads resumption (REST in FTP), Could be true. manual resumption (wget -c), Could be true. (I never use wget -c.) It's the consequence of EOL conversion affecting file size. break timestamping, How so? By changing file size, which will appear different than what is reported by the server and will cause the file to always be downloaded due to size mismatch. Most Wget's users seem to want byte-by-byte copies, because I don't remember a single bug report about the lack of ASCII conversions. You mean other than the one from the fellow who started this thread? Yes, sorry.
Re: wget and ASCII mode
[EMAIL PROTECTED] (Steven M. Schweda) writes: It does seem a bit odd that no one has noticed this fundamental problem until now, but then I missed it, too. Long ago I intentionally made Wget use binary mode by default and not muck with line endings because I believed exact data transfer was important to get right first. Unfortunately EOL conversions break automatic downloads resumption (REST in FTP), manual resumption (wget -c), break timestamping, and probably would break checksums if we added them. Most Wget's users seem to want byte-by-byte copies, because I don't remember a single bug report about the lack of ASCII conversions. The one thing that is surely wrong about my approach is the ';type=a' option, which should either be removed or come with a big fat warning that it *doesn't* implement the required conversion to native EOL convention and that it's provided for the sake of people who need text transfers and are willing to invoke dos2unix/unix2dos (or their OS equivalent) themselves.
Re: Wget and Secure Pages
John Haymaker [EMAIL PROTECTED] writes: I am trying to download all pages in my site except secure pages that require login. Problem: when wget encounters a secure page requiging the user to log in, it hangs there for up to an hour. Then miraculously, it moves on. By secure pages do you mean https: pages? Normally Wget has a timeout mechanism that prevents it from hanging for so long (the default timeout is 15 minutes, but it can be shortened to 10 seconds or to whatever works for you), but it sometimes doesn't work for OpenSSL.
Re: wget and ASCII mode
[...] (The new code does make one potentially risky assumption, but it's explained in the comments.) The latest code in my patches and in my new 1.9.1d kit (for VMS, primarily, but not exclusively) removes the potentially risky assumption (CR and LF in the same buffer), so it should be swell. I've left it for someone else to activate the conditional code which would restore CR-LF line endings on systems where that's preferred. It does seem a bit odd that no one has noticed this fundamental problem until now, but then I missed it, too. Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: WGET return status codes
Thanks. From: Mauro Tortonesi [EMAIL PROTECTED] Organization: University of Ferrara To: [EMAIL PROTECTED] Subject:Re: WGET return status codes Date sent: Sat, 18 Jun 2005 15:33:26 -0500 Copies to: [EMAIL PROTECTED] On Tuesday 14 June 2005 07:06 am, Zinovy Malkin wrote: Dear all, I'm not sure the address I'm sending this message is appropriate, sorry. Could anybody advise me please where can I find the list of the wget return status codes. at the moment wget status codes are not completely standardized, so you probably don't want to rely on the return status codes to understand why a download failed. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it -- Dr. Zinovy M. Malkin e-mail: [EMAIL PROTECTED] Head Tel: +7-(812)-275-1024 Lab of Space Geodesy and Earth Rotation Fax: +7-(812)-275-1119 Institute of Applied Astronomy RAShttp://www.zmalkin.com/ nab. Kutuzova, 10http://www.ipa.nw.ru/PAGE/DEPFUND/GEO/zm/ St. Petersburg 191187 Russia --
Re: WGET return status codes
On Tuesday 14 June 2005 07:06 am, Zinovy Malkin wrote: Dear all, I'm not sure the address I'm sending this message is appropriate, sorry. Could anybody advise me please where can I find the list of the wget return status codes. at the moment wget status codes are not completely standardized, so you probably don't want to rely on the return status codes to understand why a download failed. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 problems under AIX
Hi Hrvoje, Thanks for the detailed report! Thanks for your detailed answer ;-) Jens Schleusener [EMAIL PROTECTED] writes: 1) Only using the configure-option --disable-nls and the C compiler gcc 4.0.0 the wget-binary builds successfully I'd be interested in seeing the error log without --disable-nls and/or with the system compiler. I will send that logs to your personal mail-address. although gcc outputs some compiler warnings like convert.c: In function 'convert_all_links': convert.c:95: warning: incompatible implicit declaration of built-in function 'alloca' The alloca-declaring magic in config-post.h (taken from the Autoconf manual) apparently doesn't take into account that GCC wants alloca declared too. But simply calling the new wget like wget http://www.example.com/ I got always errors like --12:36:51-- http://www.example.com/ = `index.html' Resolving www.example.com... failed: Invalid flags in hints. This is really bad. Apparently your version of getaddrinfo is broken or Wget is using it incorrectly. Can you intuit which flags cause the problem? Depending on the circumstances, Wget uses AI_ADDRCONFIG, AI_PASSIVE, and/or AI_NUMERICHOST. Yes, all three seems defined, probably via /usr/include/netdb.h. Here an extract of that file: /* Flag definitions for addrinfo hints in protocol-independent name/addr/service service. RFC2133 */ /* Also flag definitions for getipnodebyname RFC 2553 */ #define AI_CANONNAME0x01/* canonical name to be included in return */ #define AI_PASSIVE 0x02/* prepare return for call to bind() */ #define AI_NUMERICHOST 0x04/* RFC 2553, nodename is a numeric host address string */ #define AI_ADDRCONFIG 0x08/* RFC 2553, source address family configured */ #define AI_V4MAPPED 0x10/* RFC 2553, accept v4 mapped addresses */ #define AI_ALL 0x20/* RFC 2553, accept all addresses */ #define AI_DEFAULT (AI_V4MAPPED | AI_ADDRCONFIG) /* RFC 2553 */ But I have no idea were the error message Invalid flags in hints comes from. Directly from wget (probably not) or from system resolver routines? After some testing I found that using the additional configure-option --disable-ipv6 solves that problem. Because it disables IPv6 (and therefore the use of getaddrinfo) altogether. Ok, that was not clear to me. 2) Using the additional configure-option --with-ssl=/usr/local/contrib fails although the openssl (0.9.7g) header files are installed under /usr/local/contrib/include/openssl/ and the libssl.a under /usr/local/contrib/lib/. This is not a standard layout, so the configure script is having problems with it. The supported layouts are one of: * No flags are needed, the includes are found without -Iincludedir, and the library gets linked in without the need for -Llibdir. * The library is installed in $root, which means that includes are in $root/include and the libraries in $root/lib. OpenSSL's own default for $root is /usr/local/ssl, which Wget checks for. To resolve situations like this, Wget should probably support specifying additional include and library directories separately. I believe you can work around this by specifying: ./configure CPPFLAGS=-I/usr/local/contrib/include -L/usr/local/contrib/lib Can you check if that works for you? That doesn't works (typo ?) better seems ./configure CPPFLAGS=-I/usr/local/contrib/include LDFLAGS=-L/usr/local/contrib/lib but that also doesn't solve the described problem. Also the configure option --with-ssl[=SSL-ROOT] respectively in my case --with-ssl=/usr/local/contrib should probably do that job. After long trial and error testing I have the impression that the configure-script has an error. If I change for e.g. at line 25771 { ac_try='test -s conftest.$ac_objext' into { ac_try='test -s .libs/conftest.$ac_objext' the generated test object file will now be found. Therefore also my openssl-installation will be found and compiled successfully into wget. 3) Using the native IBM C compiler (CC=cc) instead GNU gcc I got the compile error cc -qlanglvl=ansi -I. -I. -I/opt/include -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/contrib/etc/wgetrc\ -DLOCALEDIR=\/usr/local/contrib/share/locale\ -O -c main.c main.c, line 147.16: 1506-275 (S) Unexpected text ',' encountered. Simply changing line 147 of src/main.c from OPT__PARENT, to OPT__PARENT let the compile error vanish (sorry, I am not a C expert). [...] It's a newer C feature that leaked into Wget -- sorry about that. I'll try to get these fixed in Wget 1.10.1. Your patch works. Greetings Jens -- Dr. Jens SchleusenerT-Systems Solutions for Research GmbH Tel: +49 551 709-2493 Bunsenstr.10 Fax: +49 551 709-2169 D-37073 Goettingen [EMAIL PROTECTED] http://www.t-systems.com/
Re: wget 1.10 and ssl
Gabor Z. Papp [EMAIL PROTECTED] writes: * Hrvoje Niksic [EMAIL PROTECTED]: | new configure script coming with wget 1.10 does not honour | --with-ssl=/path/to/ssl because at linking conftest only | -I/path/to/ssl/include used, and no -L/path/to/ssl/lib | | That is not supposed to happen. Can you post configure output and/or | the relevant part of config.log? [...] Here you find everything: http://gzp.hu/tmp/wget-1.10/ According to config.log, it seems your SSL includes are not in /pkg/include after all: configure:25735: looking for SSL libraries in /pkg configure:25742: checking for includes configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c -fPIC -DPIC -o .libs/conftest.o *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25762: $? = 0 configure:25766: test -z || test ! -s conftest.err configure:25769: $? = 0 configure:25772: test -s conftest.o configure:25775: $? = 1 configure: failed program was: | | #include openssl/ssl.h | #include openssl/x509.h | #include openssl/err.h | #include openssl/rand.h | #include openssl/des.h | #include openssl/md4.h | #include openssl/md5.h | configure:25787: result: not found configure:26031: error: failed to find OpenSSL libraries Configure will try to link (and use -L$root/lib) only after includes are shown to be found.
Re: wget 1.10 problems under AIX
Jens Schleusener [EMAIL PROTECTED] writes: --12:36:51-- http://www.example.com/ = `index.html' Resolving www.example.com... failed: Invalid flags in hints. This is really bad. Apparently your version of getaddrinfo is broken or Wget is using it incorrectly. Can you intuit which flags cause the problem? Depending on the circumstances, Wget uses AI_ADDRCONFIG, AI_PASSIVE, and/or AI_NUMERICHOST. Yes, all three seems defined, probably via /usr/include/netdb.h. Then I am guessing that AIX's getaddrinfo doesn't like AF_UNSPEC family + AI_ADDRCONFIG hint. If you use `wget -4 http://www.example.com/', does it then work? But I have no idea were the error message Invalid flags in hints comes from. Directly from wget (probably not) or from system resolver routines? From the system resolver, which Wget invokes via getaddrinfo. That doesn't works (typo ?) better seems ./configure CPPFLAGS=-I/usr/local/contrib/include LDFLAGS=-L/usr/local/contrib/lib That's what I meant, sorry. But that is pretty much what --with-ssl=/usr/local/include does. (I misread your original message, thinking that the OpenSSL includes were in an entirely different location). respectively in my case --with-ssl=/usr/local/contrib should probably do that job. Yes. I'd like to see config.log, or the relevant parts thereof, which should contain errors. After long trial and error testing I have the impression that the configure-script has an error. If I change for e.g. at line 25771 { ac_try='test -s conftest.$ac_objext' into { ac_try='test -s .libs/conftest.$ac_objext' the generated test object file will now be found. But why don't I (and other non-AIX testers) have that problem? Maybe Libtool is doing something strange on AIX?
Re: wget 1.10 and ssl
Gabor Z. Papp [EMAIL PROTECTED] writes: * Hrvoje Niksic [EMAIL PROTECTED]: | According to config.log, it seems your SSL includes are not in | /pkg/include after all: Sure, they are in /pkg/include/openssl. You're right. The Autoconf-generated test is wrong, and I'm trying to figure out why. configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c -fPIC -DPIC -o .libs/conftest.o *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25762: $? = 0 configure:25766: test -z || test ! -s conftest.err configure:25769: $? = 0 configure:25772: test -s conftest.o configure:25775: $? = 1 Of course there is no conftest.o when the file is specifically requested in .libs/conftest.o! However, that's not how it works for me: configure:25465: /bin/sh ./libtool /opt/gcc4/bin/gcc -o conftest -O2 -Wall -Wno-implicit conftest.c -ldl -lrt 5 mkdir .libs /opt/gcc4/bin/gcc -o conftest -O2 -Wall -Wno-implicit conftest.c -ldl -lrt *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25471: $? = 0 configure:25474: test -z || test ! -s conftest.err configure:25477: $? = 0 configure:25480: test -s conftest configure:25483: $? = 0 configure:25496: result: yes I am somewhat baffled by this problem.
Re: wget 1.10 problems under AIX
Hi Hrvoje, Jens Schleusener [EMAIL PROTECTED] writes: --12:36:51-- http://www.example.com/ = `index.html' Resolving www.example.com... failed: Invalid flags in hints. This is really bad. Apparently your version of getaddrinfo is broken or Wget is using it incorrectly. Can you intuit which flags cause the problem? Depending on the circumstances, Wget uses AI_ADDRCONFIG, AI_PASSIVE, and/or AI_NUMERICHOST. Yes, all three seems defined, probably via /usr/include/netdb.h. Then I am guessing that AIX's getaddrinfo doesn't like AF_UNSPEC family + AI_ADDRCONFIG hint. If you use `wget -4 http://www.example.com/', does it then work? Works. But I have no idea were the error message Invalid flags in hints comes from. Directly from wget (probably not) or from system resolver routines? From the system resolver, which Wget invokes via getaddrinfo. That doesn't works (typo ?) better seems ./configure CPPFLAGS=-I/usr/local/contrib/include LDFLAGS=-L/usr/local/contrib/lib That's what I meant, sorry. But that is pretty much what --with-ssl=/usr/local/include does. (I misread your original message, thinking that the OpenSSL includes were in an entirely different location). respectively in my case --with-ssl=/usr/local/contrib should probably do that job. Yes. I'd like to see config.log, or the relevant parts thereof, which should contain errors. Here the config.log extract: configure:25735: looking for SSL libraries in /usr/local/contrib configure:25742: checking for includes configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC -o .libs/conftest.o *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25762: $? = 0 configure:25766: test -z || test ! -s conftest.err configure:25769: $? = 0 configure:25772: test -s conftest.o configure:25775: $? = 1 configure: failed program was: | | #include openssl/ssl.h | #include openssl/x509.h | #include openssl/err.h | #include openssl/rand.h | #include openssl/des.h | #include openssl/md4.h | #include openssl/md5.h | configure:25787: result: not found configure:26031: error: failed to find OpenSSL libraries The reason for the above error is as already written - at least in my case using the self compiled libtool version 1.5 - that the configure script tests for the non-existing conftest.o instead for the generated and existing .libs/conftest.o. The above line configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC -o .libs/conftest.o looks for me (as a configure-layman) a little bit strange (gcc twice?). Here an corresponding extract of config.log (from the same system) while configuring lynx2.8.6dev.13 where I have no such problems (but the configure script and the conftest.c file looks different): configure:8128: checking for openssl include directory configure:8145: gcc -c -I/usr/local/contrib/include -I/usr/local/contrib/include -D_ACS_COMPAT_CODE -D_POSIX_C_SOURCE=199506L conftest.c 5 configure:8148: $? = 0 configure:8151: test -s conftest.o configure:8154: $? = 0 configure:8163: result: yes After long trial and error testing I have the impression that the configure-script has an error. If I change for e.g. at line 25771 { ac_try='test -s conftest.$ac_objext' into { ac_try='test -s .libs/conftest.$ac_objext' the generated test object file will now be found. But why don't I (and other non-AIX testers) have that problem? Maybe Libtool is doing something strange on AIX? I will try to re-compile current used libtool version 1.5 under AIX 5.1 (may be it was built under AIX 4.3) and compile and use the newest libtool (version 1.5.18). Greetings Jens -- Dr. Jens SchleusenerT-Systems Solutions for Research GmbH Tel: +49 551 709-2493 Bunsenstr.10 Fax: +49 551 709-2169 D-37073 Goettingen [EMAIL PROTECTED] http://www.t-systems.com/
Re: wget 1.10 problems under AIX
Jens Schleusener [EMAIL PROTECTED] writes: The reason for the above error is as already written - at least in my case using the self compiled libtool version 1.5 I don't think the libtool version used on the system makes any difference (except for a developer at the point of libtoolizing his program), since Wget uses the libtool code from the release tarball. - that the configure script tests for the non-existing conftest.o instead for the generated and existing .libs/conftest.o. You are right, but I don't understand why it doesn't happen for me. The above line configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC -o .libs/conftest.o looks for me (as a configure-layman) a little bit strange (gcc twice?). GCC is only invoked twice. The first line is configure telling you the command it is about run. The second line is libtool telling you exactly how it is about to run GCC. For me there is no `-o .libs/conftest.o', even though I use the same libtool invocation on my system. I will try to re-compile current used libtool version 1.5 under AIX 5.1 (may be it was built under AIX 4.3) and compile and use the newest libtool (version 1.5.18). Unfortunately I don't think it's going to change anything, as explained above. I don't think the people who merely *build* software are even supposed to have to have libtool installed in the first place.
Re: wget 1.10 problems under AIX
Hi, The above line configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC -o .libs/conftest.o looks for me (as a configure-layman) a little bit strange (gcc twice?). GCC is only invoked twice. The first line is configure telling you the command it is about run. The second line is libtool telling you exactly how it is about to run GCC. For me there is no `-o .libs/conftest.o', even though I use the same libtool invocation on my system. Sorry, if I bother you but here I see a difference under AIX 5.1 to for e.g. my SuSE 9.3 system: AIX 5.1 (SSL-ROOT=/usr/local/contrib): == configure:25742: checking for includes configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC -o .libs/conftest.o *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25762: $? = 0 configure:25766: test -z || test ! -s conftest.err configure:25769: $? = 0 configure:25772: test -s conftest.o configure:25775: $? = 1 SuSE 9.3 (SSL-ROOT=/usr): =A configure:25742: checking for includes configure:25756: /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c 5 gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c -fPIC -DPIC -o .libs/conftest.o gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c -o conftest.o /dev/null 21 *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require --mode=MODE be specified. configure:25762: $? = 0 configure:25766: test -z || test ! -s conftest.err configure:25769: $? = 0 configure:25772: test -s conftest.o configure:25775: $? = 0 configure:25778: result: found That output is produces from the configure line 25756: if { (eval echo $as_me:$LINENO: \$ac_compile\) 5 The real compiling is done by the next line 25756: (eval $ac_compile) 2conftest.er1 The content of $ac_compile seems to be (SuSE) /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c 5 but that produces under SuSE (Linux) obviously two gcc processes and two object files (.libs/conftest.o AND conftest.o) so that the objectfile existing test (searching only for conftest.o) is successful. But under AIX only one object file (.libs/conftest.o) is generated that the object file existing test doesn't find. Greetings Jens -- Dr. Jens SchleusenerT-Systems Solutions for Research GmbH Tel: +49 551 709-2493 Bunsenstr.10 Fax: +49 551 709-2169 D-37073 Goettingen [EMAIL PROTECTED] http://www.t-systems.com/
Re: wget 1.10 problems under AIX
Thanks for the detailed report! Jens Schleusener [EMAIL PROTECTED] writes: 1) Only using the configure-option --disable-nls and the C compiler gcc 4.0.0 the wget-binary builds successfully I'd be interested in seeing the error log without --disable-nls and/or with the system compiler. although gcc outputs some compiler warnings like convert.c: In function 'convert_all_links': convert.c:95: warning: incompatible implicit declaration of built-in function 'alloca' The alloca-declaring magic in config-post.h (taken from the Autoconf manual) apparently doesn't take into account that GCC wants alloca declared too. But simply calling the new wget like wget http://www.example.com/ I got always errors like --12:36:51-- http://www.example.com/ = `index.html' Resolving www.example.com... failed: Invalid flags in hints. This is really bad. Apparently your version of getaddrinfo is broken or Wget is using it incorrectly. Can you intuit which flags cause the problem? Depending on the circumstances, Wget uses AI_ADDRCONFIG, AI_PASSIVE, and/or AI_NUMERICHOST. After some testing I found that using the additional configure-option --disable-ipv6 solves that problem. Because it disables IPv6 (and therefore the use of getaddrinfo) altogether. 2) Using the additional configure-option --with-ssl=/usr/local/contrib fails although the openssl (0.9.7g) header files are installed under /usr/local/contrib/include/openssl/ and the libssl.a under /usr/local/contrib/lib/. This is not a standard layout, so the configure script is having problems with it. The supported layouts are one of: * No flags are needed, the includes are found without -Iincludedir, and the library gets linked in without the need for -Llibdir. * The library is installed in $root, which means that includes are in $root/include and the libraries in $root/lib. OpenSSL's own default for $root is /usr/local/ssl, which Wget checks for. To resolve situations like this, Wget should probably support specifying additional include and library directories separately. I believe you can work around this by specifying: ./configure CPPFLAGS=-I/usr/local/contrib/include -L/usr/local/contrib/lib Can you check if that works for you? 3) Using the native IBM C compiler (CC=cc) instead GNU gcc I got the compile error cc -qlanglvl=ansi -I. -I. -I/opt/include -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/contrib/etc/wgetrc\ -DLOCALEDIR=\/usr/local/contrib/share/locale\ -O -c main.c main.c, line 147.16: 1506-275 (S) Unexpected text ',' encountered. Simply changing line 147 of src/main.c from OPT__PARENT, to OPT__PARENT let the compile error vanish (sorry, I am not a C expert). [...] It's a newer C feature that leaked into Wget -- sorry about that. I'll try to get these fixed in Wget 1.10.1.
Re: wget 1.10 problems under AIX
This patch should take care of the problems with compiling Wget 1.10 with the native IBM cc. 2005-06-15 Hrvoje Niksic [EMAIL PROTECTED] * host.h (ip_address): Remove the trailing comma from the type enum in the no-IPv6 case. * main.c (struct cmdline_option): Remove the trailing comma from the enum. Reported by Jens Schleusener. Index: src/host.h === RCS file: /pack/anoncvs/wget/src/host.h,v retrieving revision 1.27 diff -u -r1.27 host.h --- src/host.h 2005/03/04 19:21:01 1.27 +++ src/host.h 2005/06/15 20:06:53 @@ -49,9 +49,9 @@ typedef struct { /* Address type. */ enum { -IPV4_ADDRESS, +IPV4_ADDRESS #ifdef ENABLE_IPV6 -IPV6_ADDRESS +, IPV6_ADDRESS #endif /* ENABLE_IPV6 */ } type; Index: src/main.c === RCS file: /pack/anoncvs/wget/src/main.c,v retrieving revision 1.137 diff -u -r1.137 main.c --- src/main.c 2005/05/06 15:50:50 1.137 +++ src/main.c 2005/06/15 20:06:54 @@ -144,7 +144,7 @@ OPT__DONT_REMOVE_LISTING, OPT__EXECUTE, OPT__NO, -OPT__PARENT, +OPT__PARENT } type; const void *data;/* for standard options */ int argtype; /* for non-standard options */
Re: wget segfault on malformed working directory
Nagy Ferenc Lszl [EMAIL PROTECTED] writes: If the ftp server returns invalid data (for example '221 Bye.') in response to PWD, wget segfaults because in ftp_pwd (ftp-basic.c) request will be NULL after the line 'request = strtok (NULL, \);', and this NULL will be passed to xstrdup. Thanks for the report; this patch should fix the problem: 2005-06-15 Hrvoje Niksic [EMAIL PROTECTED] * ftp-basic.c (ftp_pwd): Handle malformed PWD response. Index: src/ftp-basic.c === RCS file: /pack/anoncvs/wget/src/ftp-basic.c,v retrieving revision 1.47 diff -u -r1.47 ftp-basic.c --- src/ftp-basic.c 2005/05/16 22:08:57 1.47 +++ src/ftp-basic.c 2005/06/15 20:10:43 @@ -1081,6 +1081,7 @@ return err; if (*respline == '5') { +err: xfree (respline); return FTPSRVERR; } @@ -1089,6 +1090,10 @@ and everything following it. */ strtok (respline, \); request = strtok (NULL, \); + if (!request) +/* Treat the malformed response as an error, which the caller has + to handle gracefully anyway. */ +goto err; /* Has the `pwd' been already allocated? Free! */ xfree_null (*pwd);
RE: wget 1.10 released
Windows MSVC binary at http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Friday, June 10, 2005 9:12 AM To: wget@sunsite.dk; [EMAIL PROTECTED] Subject: wget 1.10 released hi to everybody, i have just uploaded the wget 1.10 tarball on ftp.gnu.org: ftp://ftp.gnu.org/gnu/wget/wget-1.10.tar.gz you can find the GPG signature of the tarball at these URLs: ftp://ftp.gnu.org/gnu/wget/wget-1.10.tar.gz.sig and the GPG key i have used for the signature at this URL: http://www.tortonesi.com/GNU-GPG-Key.txt the key fingerprint is: pub 1024D/7B2FD4B0 2005-06-02 Mauro Tortonesi (GNU Wget Maintainer) [EMAIL PROTECTED] Key fingerprint = 1E90 AEA8 D511 58F0 94E5 B106 7220 24E9 7B2F D4B0 the MD5 checksum of the tarball (and signature) are: caddc199d2cb31969e32b19fd365b0c5 wget-1.10.tar.gz 7dff7d39129051897ab6268b713766bf wget-1.10.tar.gz.sig the long-awaited 1.10 release is a significant improvement over the last 1.9.1 release, introducing a few important features like long file support and NTLM authentication, lots of improvements (especially in IPv6 and SSL code) and many bugfixes. last but not least, a brief personal comment. this is my first release as wget maintainer, and i am very excited about it. however i would like to say that, even if he stepped down from the maintainer position, the main author of wget is still hrvoje niksic, who really did an awesome work on wget 1.10. hrvoje is one of the best developers i have ever worked with and i would like to thank him for all the effort he put on the this release of wget, especially since the last few months have been rather difficult for him. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: wget and ASCII mode
Thank you. I appreciate this. Will keep you posted on how it turns out. Regards, Kiran -Original Message- From: Steven M. Schweda [mailto:[EMAIL PROTECTED] Sent: Saturday, June 04, 2005 8:39 AM To: WGET@sunsite.dk Cc: Kiran Atlluri Subject: Re: wget and ASCII mode From: Kiran Atlluri [...] I am trying to retrieve a ?.csv? file on a unix system using wget (ftp mode).I When I retrieve a file using normal FTP and specify ASCII mode, I successfully get the file and there are no ? ^ M ? at the end of line in this file. But when I use wget all the lines in the file have this ? ^M ? at the end. [...] This happens because write_data() (in src/retr.c) does nothing to adjust the FTP-standard CR-LF line endings according to the local standard (in this case, LF-only), which a proper FTP client should do. A fix for this was included among my recent (well, not _very_ recent now) VMS-related patch submissions, but it would probably be a mistake to hold your breath waiting for those changes to be incorporated into the main code stream. If you're desperate to see what I did to fix this, you could visit: http://antinode.org/ftp/wget/patch1/ ftp://antinode.org/wget/patch1/ A quick search for the (new) enum value rb_ftp_ascii suggests that the relevant changes are in ftp.c, retr.c, and retr.h. Feel free to get in touch if you have any questions about what you find there. (The new code does make one potentially risky assumption, but it's explained in the comments.) Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget 1.10 release candidate 1
Zitat von Oliver Schulze L. [EMAIL PROTECTED]: Hi Mauro, do you know if the regex patch from Tobias was applied to this release? Thanks Oliver The last words on this topic that I remember were here: http://www.mail-archive.com/wget@sunsite.dk/msg07436.html Regards, J.Roderburg
Re: wget 1.10 release candidate 1
Thanks Jochen, I'm downloading both now Oliver Jochen Roderburg wrote: Zitat von "Oliver Schulze L." [EMAIL PROTECTED]: Hi Mauro, do you know if the regex patch from Tobias was applied to this release? Thanks Oliver The last words on this topic that I remember were here: http://www.mail-archive.com/wget@sunsite.dk/msg07436.html Regards, J.Roderburg -- Oliver Schulze L. [EMAIL PROTECTED]
Re: wget and ASCII mode
From: Kiran Atlluri [...] I am trying to retrieve a ?.csv? file on a unix system using wget (ftp mode).I When I retrieve a file using normal FTP and specify ASCII mode, I successfully get the file and there are no ? ^ M ? at the end of line in this file. But when I use wget all the lines in the file have this ? ^M ? at the end. [...] This happens because write_data() (in src/retr.c) does nothing to adjust the FTP-standard CR-LF line endings according to the local standard (in this case, LF-only), which a proper FTP client should do. A fix for this was included among my recent (well, not _very_ recent now) VMS-related patch submissions, but it would probably be a mistake to hold your breath waiting for those changes to be incorporated into the main code stream. If you're desperate to see what I did to fix this, you could visit: http://antinode.org/ftp/wget/patch1/ ftp://antinode.org/wget/patch1/ A quick search for the (new) enum value rb_ftp_ascii suggests that the relevant changes are in ftp.c, retr.c, and retr.h. Feel free to get in touch if you have any questions about what you find there. (The new code does make one potentially risky assumption, but it's explained in the comments.) Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget 1.10 release candidate 1
Hi, Neither, rc1 or alpha2 have prce patch included. I think that prce is a very usefull patch, and it should be added to CVS and not enabled by default in the ./configure script. So, if you want to use prce, just ./configure --with-prce and everybody is happy. Just my 2c Oliver Jochen Roderburg wrote: Zitat von "Oliver Schulze L." [EMAIL PROTECTED]: Hi Mauro, do you know if the regex patch from Tobias was applied to this release? Thanks Oliver The last words on this topic that I remember were here: http://www.mail-archive.com/wget@sunsite.dk/msg07436.html Regards, J.Roderburg -- Oliver Schulze L. [EMAIL PROTECTED]
Re: wget 1.10 release candidate 1
Zitat von Oliver Schulze L. [EMAIL PROTECTED]: Neither, rc1 or alpha2 have prce patch included. I think that prce is a very usefull patch, and it should be added to CVS and not enabled by default in the ./configure script. So, if you want to use prce, just ./configure --with-prce and everybody is happy. Hmmm, you mean everybody who has prce is happy? Did you not read the message that I pointed you to ;-) ?? It said that the developers do not want to include a regex patch in wget until they find a solution that is portable enough to all systems that wget is supposed to run on. And no, I'm not involved in this, just wanted to remind that this has been discussed already a few times on the list ;-) J.Roderburg
Re: wget 1.10 release candidate 1
Hi Jochen, yes, I readed it. Thats why I suggested using an option to ./configure in order to enabled it. And, it should be disabled by default. Its a nice options for all, because, if you don't have pcre, you won't receive any warning and it won't hurt nobody. HTH Oliver Jochen Roderburg wrote: Zitat von "Oliver Schulze L." [EMAIL PROTECTED]: Neither, rc1 or alpha2 have prce patch included. I think that prce is a very usefull patch, and it should be added to CVS and not enabled by default in the ./configure script. So, if you want to use prce, just ./configure --with-prce and everybody is happy. Hmmm, you mean everybody who has "prce" is happy? Did you not read the message that I pointed you to ;-) ?? It said that the developers do not want to include a regex patch in wget until they find a solution that is portable enough to all systems that wget is supposed to run on. And no, I'm not involved in this, just wanted to remind that this has been discussed already a few times on the list ;-) J.Roderburg -- Oliver Schulze L. [EMAIL PROTECTED]
Re: wget 1.10 release candidate 1
Hi Mauro, do you know if the regex patch from Tobias was applied to this release? Thanks Oliver Mauro Tortonesi wrote: dear friends, i have just released the first release candidate of wget 1.10: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.bz2 you are encouraged to download the tarballs, test if the code works properly and report any bug you find. if no major bug report will be submitted in the next two days, i am planning to release wget 1.10 next thursday.
Re: wget 1.10 release candidate 1
i have just released the first release candidate of wget 1.10: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.bz2 you are encouraged to download the tarballs, test if the code works properly and report any bug you find. The VMS changes seem to be missing. But you probably knew that. Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget Question/Suggestion
Mark Anderson [EMAIL PROTECTED] writes: Is there an option, or could you add one if there isn't, to specify that I want wget to write the downloaded html file, or whatever, to stdout so I can pipe it into some filters in a script? Yes, use `-O -'.
Re: wget-1.9.1 Tries to Connect to localhost
Jim Peterson [EMAIL PROTECTED] writes: Using Fedora Core 3, when I wget http://www.studylight.org/;, it prints out: --02:52:30-- http://www.studylight.org/ = `index.html' Resolving www.studylight.org... 63.164.18.58 Connecting to www.studylight.org[63.164.18.58]:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://localhost/ [following] --02:52:30-- http://localhost/ = `index.html' Resolving localhost... 127.0.0.1 Connecting to localhost[127.0.0.1]:80... failed: Connection refused. Why is it trying to connect to localhost? Because it redirects you to localhost, possibly as a (feeble) attempt to prevent the site from being leeched with Wget. Use `wget -U Mozilla' and the problem goes away. My browser can load the page, but if I manually telnet www.studylight.org 80 and type GET /, I get a page that tends to indicate a peculiar web server setting that returns the Apache test page. That is a symptom of the site using name-based virtual hosting. You must remember to also specify the Host header. $ telnet www.studylight.org 80 Trying 63.164.18.58... Connected to newadmin.studylight.org. Escape character is '^]'. GET / HTTP/1.0 Host: www.studylight.org HTTP/1.1 200 OK Date: Wed, 18 May 2005 08:23:12 GMT Server: Apache/1.3.33 (Unix) (Gentoo/Linux) mod_perl/1.27 Connection: close Content-Type: text/html !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN ... Is this simply an indication of a poorly administered server, or is it a bug in wget? It's overprotectiveness of the server administrator. If you do the same telnet thing, but introduce yourself as Wget, you get the bogus redirection to localhost: $ telnet www.studylight.org 80 Trying 63.164.18.58... Connected to newadmin.studylight.org. Escape character is '^]'. GET / HTTP/1.0 Host: www.studylight.org User-Agent: Wget/1.9.1 HTTP/1.1 302 Found Date: Wed, 18 May 2005 08:24:10 GMT Server: Apache/1.3.33 (Unix) (Gentoo/Linux) mod_perl/1.27 Location: http://localhost/ Connection: close Content-Type: text/html; charset=iso-8859-1 !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN HTMLHEAD TITLE302 Found/TITLE /HEADBODY H1Found/H1 The document has moved A HREF=http://localhost/;here/A.P HR ADDRESSApache/1.3.33 Server at studylight.org Port 80/ADDRESS /BODY/HTML
Re: wget-1.9.1 Tries to Connect to localhost
On Tuesday 17 May 2005 01:56 am, Jim Peterson wrote: Using Fedora Core 3, when I wget http://www.studylight.org/;, it prints out: --02:52:30-- http://www.studylight.org/ = `index.html' Resolving www.studylight.org... 63.164.18.58 Connecting to www.studylight.org[63.164.18.58]:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://localhost/ [following] --02:52:30-- http://localhost/ = `index.html' Resolving localhost... 127.0.0.1 Connecting to localhost[127.0.0.1]:80... failed: Connection refused. Why is it trying to connect to localhost? My browser can load the page, but if I manually telnet www.studylight.org 80 and type GET /, I get a page that tends to indicate a peculiar web server setting that returns the Apache test page. Is this simply an indication of a poorly administered server, or is it a bug in wget? it seems to be a problem with the server: DEBUG output created by Wget 1.10-beta1+cvs-dev on linux-gnu. --21:44:29-- http://www.studylight.org/ = `index.html' Resolving www.studylight.org... 63.164.18.58 Caching www.studylight.org = 63.164.18.58 Connecting to www.studylight.org|63.164.18.58|:80... connected. Created socket 4. Releasing 0x080835f8 (new refcount 1). ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.10-beta1+cvs-dev Accept: */* Host: www.studylight.org Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Found Date: Wed, 18 May 2005 02:44:29 GMT Server: Apache/1.3.33 (Unix) (Gentoo/Linux) mod_perl/1.27 Location: http://localhost/ ^ Connection: close Content-Type: text/html; charset=iso-8859-1 ---response end--- 302 Found Location: http://localhost/ [following] Closed fd 4 --21:44:30-- http://localhost/ = `index.html' Resolving localhost... 127.0.0.1 Caching localhost = 127.0.0.1 Connecting to localhost|127.0.0.1|:80... Closed fd 4 failed: Connection refused. Releasing 0x08081370 (new refcount 1). -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: wget 1.10 beta 1
Windows MSVC6 binary for testing purposes here: http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 8:41 PM To: wget@sunsite.dk Subject: wget 1.10 beta 1 dear friends, i have just released the first beta version of wget 1.10: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-beta1.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-beta1.tar.bz2 you are encouraged to download the tarballs, test if the code works properly and report any bug you find. i am still doing tests on this code, but it seems to work fine, so i think we'll be able to release wget 1.10 in 7-10 days. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget doesn't get all page requisites...
Joerg Ottermann [EMAIL PROTECTED] writes: i try to archive some pages using wget, but it seems, that i have some problems when TE:chunked is used. The server must not use Transfer-Encoding: chunked in response to an HTTP/1.0 request. Are you sure that is the problem?
Re: wget with ? and in urls
Vitaly Lomov [EMAIL PROTECTED] writes: Hello I am trying to get a site http://www.cro.ie/index.asp with the following flags -r -l2 or -kr -l2 or -Er -l2 or -Ekr -l2 In all cases, the linked files are saved with '@' instead of '?' in the name, but in the index.asp the link still refers to names with '?' Maybe you're not letting Wget finish the mirroring. The links are converted only after everything has been downloaded. I've now tried `wget -Ekrl2 http://www.cro.ie/index.asp --restrict-file-names=windows' (the last argument being to emulate modification of ? to @ done under Windows) and it converted the links correctly. The only links not converted were the ones generated in JavaScript, but only two of those were in index.asp -- the rest were converted correctly.
Re: wget with ? and in urls
Vitaly Lomov [EMAIL PROTECTED] writes: Maybe you're not letting Wget finish the mirroring. The links are converted only after everything has been downloaded. I've now tried `wget -Ekrl2 http://www.cro.ie/index.asp --restrict-file-names=windows' (the last argument being to emulate modification of ? to @ done under Windows) and it converted the links correctly. Actually, it never finishes for me. I have waited for an hour now, still waits for response. I don't know how you could do it in 30min. I now tried it again, and it took about 10 minutes. My DSL connection was the bottle-neck. I just copied your command line, ran it and it blocks. Then I put the timeout -T9 , still waits: --17:30:02-- http://www.cro.ie/search/template_generic.asp?ID=8Level1=3Level2=0 = `www.cro.ie/search/[EMAIL PROTECTED]Level1=3Level2=0' Reusing existing connection to www.cro.ie:80. HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9 Length: unspecified [= ] 0 --.--K/s I got this for that part of the download: --20:54:00-- http://www.cro.ie/search/template_generic.asp?ID=8Level1=3Level2=0 = `www.cro.ie/search/[EMAIL PROTECTED]Level1=3Level2=0' Connecting to www.cro.ie|62.17.220.228|:80... connected. HTTP request sent, awaiting response... 404 Not Found 20:54:02 ERROR 404: Not Found. The assuming HTTP/0.9 you see is potentially dangerous because it might indicate that a previous download left things in a strange state. Does the download work if you use --no-http-keep-alive? Do you run yours on non-Windows? maybe that's the difference. Will a debug printout of this help you? I haven't tried it on Windows yet because I thought the problem was related to link conversion and therefore occurred on all platforms.
Re: Wget converts links correctly *only* for the first time
Andrzej [EMAIL PROTECTED] writes: Will the patches be included in the stable 1.10? Probably. 1.10 is in feature freeze, but this really is a bug fix. I'd like to check with others if that change is deemed safe for mirroring of other sites. Clicking on that link redirects to that page: https://lists.man.lodz.pl/mailman/listinfo and from all the links which are on that page the files are unnecessarily downloaded (I do not want that page and the subpages). So how can I block it? Could you use -X /mailman/listinfo ?
Re: Wget converts links correctly *only* for the first time
Clicking on that link redirects to that page: https://lists.man.lodz.pl/mailman/listinfo and from all the links which are on that page the files are unnecessarily downloaded (I do not want that page and the subpages). So how can I block it? Could you use -X /mailman/listinfo ? I tried now, and it did not help. Still instead of just index.html and index.html.orig and subdirectories of the http://lists.man.lodz.pl/pipermail/mineraly/ there are many many other files downloaded from the https://lists.man.lodz.pl/mailman/listinfo page: admin.html admin.html.orig chemfan.html chemfan.html.orig create.html create.html.orig gnu-head-tiny.jpg info info.1.html info.1.html.orig listinfo.html listinfo.html.orig lodz-l.html lodz-l.html.orig mailman.jpg mineraly.html mineraly.html.orig mineralyftp mm-icon.png odlew-pl.html odlew-pl.html.orig os2.html os2.html.orig pecet.html pecet.html.orig pol34-info pol34-info.1.html pol34-info.1.html.orig polip.html polip.html.orig PythonPowered.png test.html test.html.orig a.
Re: Wget converts links correctly *only* for the first time
Andrzej [EMAIL PROTECTED] writes: Clicking on that link redirects to that page: https://lists.man.lodz.pl/mailman/listinfo and from all the links which are on that page the files are unnecessarily downloaded (I do not want that page and the subpages). So how can I block it? Could you use -X /mailman/listinfo ? I tried now, and it did not help. Still instead of just index.html and index.html.orig and subdirectories of the http://lists.man.lodz.pl/pipermail/mineraly/ I believe 1.9.1 had a bug in this area when -m (which implies -l0) was used. Could you try specifying -l50 along with the other options, and after -m?
Re: Wget converts links correctly *only* for the first time
I believe 1.9.1 had a bug in this area when -m (which implies -l0) was used. Could you try specifying -l50 along with the other options, and after -m? It still downloaded everything. a.
Re: Wget converts links correctly *only* for the first time.
Yup. So I assume that the problem you see is not that of wget mirroring, but a combination of saving to a custom dir (with --cut-dirs and the like) and conversion of the links. Obviously, the link to http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html which would be correct for a standard wget -m URL was carried over while the custom link to http://mineraly.feedle.com/Ftp/UpLoad/index.html was not created. My test with wget 1.5 just was a simple wget15 -m -np URL and it worked. So maybe the convert/rename problem/bug was solved with 1.9.1 This would also explain the missing gif file, I think. And the above quoted link is also incorrect after second run of wget, it is now again: http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html :(::: a.
Re: Wget converts links correctly *only* for the first time
Andrzej [EMAIL PROTECTED] writes: It's not the end of troubles though! It works correctly *only* for the first time! When I (or cron) run the same mirroring commands again over already mirrored files to renew the mirror, then the correctly converted link of the gif file (on the main mirror web page): http://mineraly.feedle.com/Gify/ChemFan.gif is exchanged to the incorrect one: http://znik.wbc.lublin.pl/Mineraly/Gify/ChemFan.gif The problem is that Wget is re-converting the files it decided it didn't want to download due to timestamping. For example: 1st time: URL: http://znik.wbc.lublin.pl/Mineraly/ link: img src=http://znik.wbc.lublin.pl/ChemFan/Gify/ChemFan.gif; Since the image is downloaded to Gify/ChemFan.gif, this is converted to: img src=Gify/ChemFan.gif 2nd time: URL: http://znik.wbc.lublin.pl/Mineraly/ (using local copy of that URL) link: img src=Gify/ChemFan.gif Since no such image is downloaded, Wget converts the link back to absolute one. Merging http://znik.wbc.lublin.pl/Mineraly/; with Gify/ChemFan.gif results in the totally bogus http://znik.wbc.lublin.pl/Mineraly/Gify/ChemFan.gif; that you're seeing. That explains the mechanics of the bug, but not what to do about it. There are two solutions: 1. If an HTML file is re-downloaded because of time-stamping, it should not be re-converted because (since the file hasn't changed) there is no reason to do so. I'm trying to think of a scenario where this would break things, but I can't come up with any. 2. If --backup-converted is in use (which it is in your case), link conversion could read the pristine .orig file and write it to the resulting HTML. This is a bit more complex, but might help if solution #1 turns out to break some scenarios. Here is a patch that implements #1. (It applies to the CVS source, but it's easy enough to manually apply it to the source of 1.9.1.) With that patch the mirror seems correct in the 2nd run. Please let me know if it works for you. Index: src/http.c === RCS file: /pack/anoncvs/wget/src/http.c,v retrieving revision 1.173 diff -u -r1.173 http.c --- src/http.c 2005/04/28 13:56:31 1.173 +++ src/http.c 2005/05/02 14:58:53 @@ -2318,6 +2318,11 @@ local_filename); free_hstat (hstat); xfree_null (dummy); + /* The file is the same; assume that the links have +already been converted. Otherwise we run the +risk of converting links twice, which is +wrong. */ + *dt |= DT_DISABLE_CONVERSION; return RETROK; } else if (tml = tmr) Index: src/retr.c === RCS file: /pack/anoncvs/wget/src/retr.c,v retrieving revision 1.95 diff -u -r1.95 retr.c --- src/retr.c 2005/04/16 20:12:43 1.95 +++ src/retr.c 2005/05/02 14:58:55 @@ -761,7 +761,7 @@ register_download (u-url, local_file); if (redirection_count 0 != strcmp (origurl, u-url)) register_redirection (origurl, u-url); - if (*dt TEXTHTML) + if ((*dt TEXTHTML) !(*dt DT_DISABLE_CONVERSION)) register_html (u-url, local_file); } } Index: src/wget.h === RCS file: /pack/anoncvs/wget/src/wget.h,v retrieving revision 1.57 diff -u -r1.57 wget.h --- src/wget.h 2005/04/27 21:08:40 1.57 +++ src/wget.h 2005/05/02 14:58:55 @@ -233,7 +233,8 @@ HEAD_ONLY= 0x0004, /* only send the HEAD request */ SEND_NOCACHE = 0x0008, /* send Pragma: no-cache directive */ ACCEPTRANGES = 0x0010, /* Accept-ranges header was found */ - ADDED_HTML_EXTENSION = 0x0020 /* added .html extension due to -E */ + ADDED_HTML_EXTENSION = 0x0020, /* added .html extension due to -E */ + DT_DISABLE_CONVERSION = 0x0040 /* disable link conversion */ }; /* Universal error type -- used almost everywhere. Error reporting of
Re: Wget converts links correctly *only* for the first time
With that patch the mirror seems correct in the 2nd run. Please let me know if it works for you. *After* I deleted the files with the wrong URLs, the patched wget 1.9.1 retrieved the files correctly, and after second run did not change the URLs for the wrong ones. So it worked on the pg.gda.pl. On the feedle.com I downloaded, patched and installed ver. 1.10alpha2 of wget. Double mirroring worked here, too. Thanks again for the patches. Will the patches be included in the stable 1.10? I have one more little problem: On that source page: http://lists.man.lodz.pl/pipermail/mineraly/ there is a link at the bottom: https://lists.man.lodz.pl/ Clicking on that link redirects to that page: https://lists.man.lodz.pl/mailman/listinfo and from all the links which are on that page the files are unnecessarily downloaded (I do not want that page and the subpages). So how can I block it? Is the -R option used only for extensions or also for filenames? Should I use -G option? However, I want to download everything (exept the last link) from that page: http://lists.man.lodz.pl/pipermail/mineraly/ so I cannot block all the domain http://lists.man.lodz.pl/ but clicking on that link redirects to: https://lists.man.lodz.pl/mailman/listinfo so would the -G or -R work in such situation? a.
RE: wget 1.10 alpha 3
Windows (MSVC) test binary available at http://xoomer.virgilio.it/hherold/ Notes: windows/wget.dep needs an attached patch (change gen_sslfunc to openssl.c, change gen_sslfunc.h to ssl.h). src/Makefile.in doesn't contain dependencies for http-ntlm$o (windows/wget.dep either). INSTALL should possibly mention the --disable-ntlm configure option. I still advocate a warning (placed in windows/Readme or configure.bat) for old msvc compilers, like in the attached patch. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Thursday, April 28, 2005 8:56 AM To: wget@sunsite.dk; [EMAIL PROTECTED] Subject: wget 1.10 alpha 3 dear friends, i have just released the third alpha version of wget 1.10: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha3.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha3.tar.bz2 as always, you are encouraged to download the tarballs, test if the code works properly and report any bug you find. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it 20050428.wget-dep.diff Description: Binary data 20050420.winreadme.diff Description: Binary data
Re: wget 1.10 alpha 3
Herold Heiko [EMAIL PROTECTED] writes: windows/wget.dep needs an attached patch (change gen_sslfunc to openssl.c, change gen_sslfunc.h to ssl.h). Applied, thanks. src/Makefile.in doesn't contain dependencies for http-ntlm$o (windows/wget.dep either). I don't have the dependency-generating script handy anymore. However, the dependency to the corresponding C file is automatic, and it's a good idea to `make clean' when you change a header file anyway. INSTALL should possibly mention the --disable-ntlm configure option. Done. I still advocate a warning (placed in windows/Readme or configure.bat) for old msvc compilers, like in the attached patch. Applied now.
RE: wget 1.10 alpha 3
Cannot compile if ./configure --without-ssl : ===cut on=== gcc -I. -I. -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALE DIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c init.c init.c:214: structure has no member named `random_file' init.c:214: initializer element is not constant init.c:214: (near initialization for `commands[78].place') *** Error code 1 Stop in /usr/home/yar/src/wget-1.10-alpha3/src. *** Error code 1 Stop in /usr/home/yar/src/wget-1.10-alpha3. ===cut off=== FreeBSD 4.11-RELEASE.
Re: wget 1.10 alpha 3
Thanks for the report; this problem is fixed in CVS. The workaround is to wrap the appropriate init.c line in #ifdef HAVE_SSL.
Re: Wget not resending cookies on Location: in headers
[EMAIL PROTECTED] writes: Is there a publically accessible site that exhibits this problem? I've set up a small example which illustrates the problem. Files can be found at http://dev.mesca.net/wget/ (using demo:test as login). Thanks for setting up this test case. It has uncovered at least two bug in the cookie code. $ wget --http-user=demo --http-passwd=test --cookies=on --save-cookies=cookie.txt http://dev.mesca.net/wget/setcookie.php The obvious problem is that this command lacks --keep-session-cookies, and the cookie it gets is session-based. But there are other problems as well: if you examine the cookie.txt produced by (the amended version of) the first command, you'll notice that the cookie's path is wget/setcookie.php. For one, the setcookie.php part should have been stripped (Mozilla does this, I've just checked). Second, the path should always begin with a slash. Either of these problems would guarantee that no other URL would ever match this cookie. I've now fixed both bugs in the CVS, along with a third, unrelated bug. Please let me know if the latest CVS works for you. (It works for me on the example you set up.) Several notes on usage: --cookies is the default, so you don't need --cookies=on to send and receive them. Second, it's somewhat shorter to specify the user name and password in the URL. Finally, don't forget --keep-session-cookies when saving the cookies.
Re: Wget not resending cookies on Location: in headers
The obvious problem is that this command lacks --keep-session-cookies, and the cookie it gets is session-based. I tried to reproduce the bug in the more generic way. But there are other problems as well: if you examine the cookie.txt produced by (the amended version of) the first command, you'll notice that the cookie's path is wget/setcookie.php. For one, the setcookie.php part should have been stripped (Mozilla does this, I've just checked). Second, the path should always begin with a slash. Either of these problems would guarantee that no other URL would ever match this cookie. I've now fixed both bugs in the CVS, along with a third, unrelated bug. Please let me know if the latest CVS works for you. (It works for me on the example you set up.) Thanks a lot for your corrections. It's now working like a charm. It's also working with session cookies. Regards, Pierre
Re: Wget Bug
Arndt Humpert [EMAIL PROTECTED] writes: wget, win32 rel. crashes with huge files. Thanks for the report. This problem has been fixed in the latest version, available at http://xoomer.virgilio.it/hherold/ .
Wget sucks! [was: Re: wget in a loop?]
Thus it seems that it should not matter what is the sequence of the options. If it does I suggest that the developers of wget place appriopriate info in the manual. Yes, you right. Anyway I found out often that it's sometimes quite tricky setting up your command line to get exactly what you want. The way I do it always works fine for me. Could developers confirm whether sequence of options matters or not? The log shows, that you haven't downloaded all the graphics from the main page, and also you haven't downloaded that link: http://lists.feedle.net/pipermail/minerals/ Well, I didn't verify it with the homepage itself. I initially tried without -e --robots=off and got a message blocking further downloading. With this option I could achieve further access for downloading. I have only tried the one link from above. I doubt. I tried it without the option and I did not have all the graphics. -p option doesn't work as it should. I could try to use the -D option, but then probably everything would be downloaded from the lists.feedle.net despite the -np option used, wouldn't it? I don't know exactly how these two options interact with each other. Ever tried the -m option? Of course I tried, haven't you noticed in my previous posts? Very often when mirroring I use this line: wget -P work:1/ -r -l 2 -H -nc p http://www.xxx.xx; This is not really proper mirroring, merely downloading. This would have the side effect downloading other links recursively and from other hosts if there are any. You see... But of course you can define a list of allowed dirs and excluded dirs. I never tried this though. What's the point of mirroring if I would have to define every time allowed and excluded directories? I want to run mirror automaticly, periodicaly from cron, and therefore the options should be as general as possible, so that no matter what changes are done on the site I would still have the site properly mirrored without amending the options all the time. But of course some definitions of directories and sites might be necessary from time to time, but as I shown in my corespondence here it is not possible to define everything that way that mirroring would work properly for all the web elements and the web pages on a particular site. After all you maybe shouldn't forget the -k option so you can browse these sites offline. I use it. My conclusion is (and I am really sorry to say that, cause I liked wget until now): Wget sucks (for mirroring at least)! It is useful only for very simple tasks, but when one wants to use it for sites mirroring it is almost useless, it cannot be done fully properly with Wget, as it can be seen in my previous e-mails. Summary: 1. -p option doesn't do what is should be doing. It doesn't download all graphics no matter what is source of the graphics. 2. -P option used with converting links options doesn't allow the links to be properly converted (at least in the current stable wget) 2. -D and -I options do not include paths (directories) in URLs. 3. -np option should IMHO react to the paths after -D and -I options 4. Just everything should be done to enable proper mirroring of the web sites. Multitude options in Wget is just an ilusion. In real life Wget cannot cope with sites mirroring. It is not possible in Wget to set options that way that sites with some foreign elements (graphics) or web pages scattered over several servers (links to different domains) are mirrored correctly. And even if the site would not have the above problems then still the problem with proper convertion of the links exist. Does anyone know any software for linux/unix shell, which would cope to the task of proper mirroring? a.
Re: Wget sucks! [was: Re: wget in a loop?]
Andrzej [EMAIL PROTECTED] writes: Thus it seems that it should not matter what is the sequence of the options. If it does I suggest that the developers of wget place appriopriate info in the manual. Yes, you right. Anyway I found out often that it's sometimes quite tricky setting up your command line to get exactly what you want. The way I do it always works fine for me. Could developers confirm whether sequence of options matters or not? The order of options does not matter.
Re: Wget sucks! [was: Re: wget in a loop?]
Andrzej [EMAIL PROTECTED] writes: Multitude options in Wget is just an ilusion. In real life Wget cannot cope with sites mirroring. I agree with your criticism, if not with your tone. We are working on improving Wget, and I believe that the problems you have seen will be fixed in the versions to come. (I plan to look into some of them for the 1.11 release.) And even if the site would not have the above problems then still the problem with proper convertion of the links exist. That problem has been corrected, and it can be worked around by not using -P.
Re: Wget sucks! [was: Re: wget in a loop?]
I agree with your criticism, if not with your tone. We are working on improving Wget, and I believe that the problems you have seen will be fixed in the versions to come. (I plan to look into some of them for the 1.11 release.) OK. Thanks. Good to hear that. Looking forward impatiently for the new version. :) That problem has been corrected, and it can be worked around by not using -P. Yes, indeed. Thanks. In order to download all that website: http://znik.wbc.lublin.pl/ChemFan/ which, unfortunately, partly is also under this address: http://lists.man.lodz.pl/pipermail/chemfan/ I had to manually modify content of that web page: http://znik.wbc.lublin.pl/ChemFan/Archiwum/index.html (which contains the above link) and use the tricks to make a mirror of it all: cd $HOME/web/chemfan.pl wget -m -nv -k -K -E -nH --cut-dirs=1 -np -t 1000 -D wbc.lublin.pl -o $HOME/logiwget/logchemfan.pl -p http://znik.wbc.lublin.pl/ChemFan/ \ cd $HOME/web/chemfan.pl/arch \ wget -m -nv -k -K -E -nH -np --cut-dirs=2 -t 1000 -D lists.man.lodz.pl -- follow-ftp -o $HOME/logiwget/logchemfanarchive.pl -p http://lists.man.lodz.pl/pipermail/chemfan/ \ cp $HOME/web/domirrora/Archiwum/index.html $HOME/web/chemfan.pl/Archiwum/index.html == If you know how to make it simpler let me know. Do you think that is really necessary here? Of course for other sites other recipes might need to be developed in order to mirror them correctly, so unfortunately it is not universal at all. And unfortunately not everything went fine, yet, when using the above script. On the page http://lists.man.lodz.pl/pipermail/chemfan/ is a link: ftp://ftp.man.lodz.pl/pub/doc/LISTY-DYSKUSYJNE/CHEMFAN and it seems that also to mirror this I'll have to run yet another wget session, and then manually modify and copy the page: http://chemfan.pl.feedle.com/arch/index.html a.
Re: Wget not resending cookies on Location: in headers
Is there a publically accessible site that exhibits this problem? I've set up a small example which illustrates the problem. Files can be found at http://dev.mesca.net/wget/ (using demo:test as login). Three files: setcookie.php: -- ? setcookie(wget,I love it!); ? getcookie.php: -- ? header('Location: getcookie-redirect.php'); ? get-cookie-redirect.php: ? if(isset($_COOKIE['wget'])){ echo Ok, I can read the cookie: [wget] .$_COOKIE['wget']; }else{ echo Cookie is not set.; } ? We first set the cookie by wgetting setcookie.php. Then, we're trying to read the cookie by querying getcookie.php, which redirects to get-cookie-redirect.php: wget can't read it. $ wget --http-user=demo --http-passwd=test --cookies=on --save-cookies=cookie.txt http://dev.mesca.net/wget/setcookie.php $ wget --http-user=demo --http-passwd=test --cookies=on --load-cookies=cookie.txt http://dev.mesca.net/wget/getcookie.php Note: tests were made using the latest version from the CVS (1.10-alpha2+cvs-dev). Le 26 avr. 05, à 00:09, Hrvoje Niksic a écrit : - The server responds with a Location: http://host.com/member.php; in headers. Here is the point : member.php requires cookies defined by index.php and checkuser.php. However these cookies are not resended by Wget. That sounds like a bug. Wget is supposed to resend the cookies. Could you provide any kind of debug information? The contents of the cookies is not important, but the path parameter and the expiry date is. According to my tests, the problem is still reproducible whatever Path and Expiry date contain. Regards, Pierre
Re: wget in a loop?
Thanks Patrick for a reply, AFAIKS your command line is somehow complete mixed up. Usually I call wget and first give it the path where to it should save all files followed by more options and at last the url from where to get them (usually in quotation marks to be sure). According to man wget: = SYNOPSIS wget [option]... [URL]... = Thus it seems that it should not matter what is the sequence of the options. If it does I suggest that the developers of wget place appriopriate info in the manual. wget -P ram:chemfan/minerals/ -m -o ram:logminerals -nv -e --robots=off -k -K -E -nH -np -t 1000 -p http://minerals.feedle.com/ The log file is attached as proof. The log shows, that you haven't downloaded all the graphics from the main page, and also you haven't downloaded that link: http://lists.feedle.net/pipermail/minerals/ I want to mirror everything including all graphics from that page: http://minerals.feedle.com/ and including recursively this link: http://lists.feedle.net/pipermail/minerals/ and this http://minerals.feedle.com/logo.html (this one is no problem) but not those links: http://lists.feedle.net/mailman/listinfo/minerals http://www.man.lodz.pl/MINERALY/ which should remain in the mirror copies as they are. I could try to use the -D option, but then probably everything would be downloaded from the lists.feedle.net despite the -np option used, wouldn't it? a.
Re: wget 1.10 alpha 2
On Wed, 20 Apr 2005, Hrvoje Niksic wrote: Herold Heiko [EMAIL PROTECTED] writes: I am greatly surprised. Do you really believe that Windows users outside an academic environment are proficient in using the compiler? I have never seen a home Windows installation that even contained a compiler, the only exception being ones that belonged to professional C or C++ developers. This is what Cygwin is all about. Once you open up the Cygwin bash shell, all you have to do with most source code is configure; make; make install. I am not a programmer and have been compiling programs for several years. As long as the program copiles cleanly, there shouldn't be a problem under Windows. I don't have any idea of how many Windows users would try to patch the code if it didn't compile out of the box. The very idea that a Windows user might grab source code and compile a package is strange. I don't remember ever seeing a Windows program distributed in source form. See, for example, htmldoc which converts html into a pdf file. The free version is only distributed as source code. Or see consoletelnet, distributed both as source and binary. Doug -- Doug Kaufman Internet: [EMAIL PROTECTED]
Re: wget 1.10 alpha 2
Doug Kaufman [EMAIL PROTECTED] writes: On Wed, 20 Apr 2005, Hrvoje Niksic wrote: Herold Heiko [EMAIL PROTECTED] writes: I am greatly surprised. Do you really believe that Windows users outside an academic environment are proficient in using the compiler? I have never seen a home Windows installation that even contained a compiler, the only exception being ones that belonged to professional C or C++ developers. This is what Cygwin is all about. Once you open up the Cygwin bash shell, all you have to do with most source code is configure; make; make install. Oh, I know that and I *love* Cygwin and use it all the time (while in Windows)! But that is beside the point because this problem doesn't occur under Cygwin in the first place -- Cygwin compilation is as clean as it gets. My point was that a typical Windows (not Cygwin) user doesn't know about the compilation process, nor can he be bothered to learn. That's a great shame, but it's something that's not likely to change. Making the code uglier for the sake of ordinary Windows users willing to compile it brings literally no gain. The above shouldn't be construed as not wanting to support Windows at all. There are Windows users, on this list and elsewhere, who are perfectly able and willing to compile Wget from source. But those users are also able to read the documentation, to turn off optimization for offending functions, not to mention to upgrade their compiler, or get a free one that is much less buggy (the Borland compiler comes to mind, but there are also Mingw, Cygwin, Watcom, etc.)
Re: wget 1.10 alpha 2
Mauro Tortonesi [EMAIL PROTECTED] writes: i totally agree with hrvoje here. in the worst case, we can add an entry in the FAQ explaining how to compile wget with those buggy versions of microsoft cc. Umm. What FAQ? :-)
RE: wget 1.10 alpha 2
(sorry for the late answer, three days of 16+ hours/day migration aren't fun, UPS battery exploding inside the UPS almost in my face even less) -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Herold Heiko [EMAIL PROTECTED] writes: do have a compiler but aren't really developers (yet) (for example first year CS students with old lab computer compilers). From my impressions of the Windows world, non-developers won't touch source code anyway -- they will simply use the binary. I feel I must dissent. Even today I'm not exactly a developer, I certainly wasn't when I first placed my greedy hands on wget sources (in order to add a couple of chars to URL_UNSAFE... back in 98 i think). I just knew where I could use a compiler and followed instructions. I'd just like wget still being compilable in an old setup by (growing) newbies, for the learning value. Maybe something like a small note in the windows/Readme instructions would be ok, as by the enclosed patch ? The really important thing is to make sure that the source works for the person likely to create the binaries, in this case you. Ideally he should have access to the latest compiler, so we don't have to cater to brokenness of obsolete compiler versions. This is not about I must confess I'm torn between the two options. Your point is very valid, on the other hand while it is still possible I'd like to continue using an old setup exactly because there are still plenty of those around and I'd like to catch these problems. Unfortunately I don't have the time to test everything on two setups, so I think I'll continue with the old one till easily feasable. Also note that there is a technical problem with your patch (if my reading of it is correct): it unconditionally turns on debugging, disregarding the command-line options. Is it possible to save the old optimization options, turn off debugging, and restore the old options? (Borland C seems to support some sort of #pragma push to achieve that effect.) It seems not, msdn mentions push only for #pragma warning, not for #pragma optimize :( optimization, or with a lesser optimization level. Ideally this would be done by configure.bat if it detects the broken compiler version. I tried but didn't find a portably (w9x-w2x) way to do that, since in w9x we can't redirect easily the standard error used by cl.exe. Possibly this could be worked around by running the test from a simple perl script, on the other hand today perl is required (on released packages) only in order to build the documentation, not for the binary, adding another dependency would be a pity. You mean that you cannot use later versions of C++ to produce Win95/Win98/NT4 binaries? I'd be very surprised if that were the Absolutely not, what I meant is, later versions can't be installed on older windows operating systems. I think Visual Studio 6 is the last MS compiler which runs on even NT4. Personally I feel wget should try to still support that not-so-old compiler platform if possible, Sure, but in this case some of the burden falls on the user of the obsolete platform: he has to turn off optimization to avoid a bug in his compiler. That is not entirely unacceptable. I concur, after all if a note is dropped in the windows/Readme either they will read it, or they will stall due to OpenSSL dependencies (on by default) anyway. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax 20050420.winreadme.diff Description: Binary data
Re: wget 1.10 alpha 2
Herold Heiko [EMAIL PROTECTED] writes: From my impressions of the Windows world, non-developers won't touch source code anyway -- they will simply use the binary. I feel I must dissent. I am greatly surprised. Do you really believe that Windows users outside an academic environment are proficient in using the compiler? I have never seen a home Windows installation that even contained a compiler, the only exception being ones that belonged to professional C or C++ developers. The very idea that a Windows user might grab source code and compile a package is strange. I don't remember ever seeing a Windows program distributed in source form. Even today I'm not exactly a developer, I certainly wasn't when I first placed my greedy hands on wget sources (in order to add a couple of chars to URL_UNSAFE... back in 98 i think). I just knew where I could use a compiler and followed instructions. I'd just like wget still being compilable in an old setup by (growing) newbies, for the learning value. Maybe something like a small note in the windows/Readme instructions would be ok, as by the enclosed patch ? That would be fine with me.
Re: wget 1.10 alpha 2
On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: i totally agree with hrvoje here. in the worst case, we can add an entry in the FAQ explaining how to compile wget with those buggy versions of microsoft cc. Umm. What FAQ? :-) the official FAQ: http://www.gnu.org/software/wget/faq.html -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 2
On Wednesday 20 April 2005 05:55 am, Herold Heiko wrote: (sorry for the late answer, three days of 16+ hours/day migration aren't fun, UPS battery exploding inside the UPS almost in my face even less) -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Herold Heiko [EMAIL PROTECTED] writes: do have a compiler but aren't really developers (yet) (for example first year CS students with old lab computer compilers). From my impressions of the Windows world, non-developers won't touch source code anyway -- they will simply use the binary. I feel I must dissent. Even today I'm not exactly a developer, I certainly wasn't when I first placed my greedy hands on wget sources (in order to add a couple of chars to URL_UNSAFE... back in 98 i think). I just knew where I could use a compiler and followed instructions. I'd just like wget still being compilable in an old setup by (growing) newbies, for the learning value. Maybe something like a small note in the windows/Readme instructions would be ok, as by the enclosed patch ? publishing a separate patch on the website and including it in the tarball along with a note in windows/Readme is ok for me. but including an ugly workaround in the main sources just to support some older versions of microsoft c is definitely not. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.9.1 -- 2 GB limit -- negative filesize
hi alexander, this is a known problem which is already fixed in cvs. perhaps you may want to try using wget 1.10-alpha2: ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.gz ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.bz2 -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 2
Mauro Tortonesi [EMAIL PROTECTED] writes: On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: i totally agree with hrvoje here. in the worst case, we can add an entry in the FAQ explaining how to compile wget with those buggy versions of microsoft cc. Umm. What FAQ? :-) the official FAQ: http://www.gnu.org/software/wget/faq.html This is the first time that I see it. It's actually pretty good, I like it.
Re: wget 1.10 alpha 2
On Wednesday 20 April 2005 02:42 pm, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: i totally agree with hrvoje here. in the worst case, we can add an entry in the FAQ explaining how to compile wget with those buggy versions of microsoft cc. Umm. What FAQ? :-) the official FAQ: http://www.gnu.org/software/wget/faq.html This is the first time that I see it. It's actually pretty good, I like it. yes, i like it very much too. it will need an update after the release of 1.10, though. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 2
On Friday 15 April 2005 07:24 am, Hrvoje Niksic wrote: Herold Heiko [EMAIL PROTECTED] writes: However there are still lots of people using Windows NT 4 or even win95/win98, with old compilers, where the compilation won't work without the patch. Even if we place a comment in the source file or the windows/Readme many of those will be discouraged, say those who do have a compiler but aren't really developers (yet) (for example first year CS students with old lab computer compilers). From my impressions of the Windows world, non-developers won't touch source code anyway -- they will simply use the binary. The really important thing is to make sure that the source works for the person likely to create the binaries, in this case you. Ideally he should have access to the latest compiler, so we don't have to cater to brokenness of obsolete compiler versions. This is not about Microsoft bashing, either: at at least one point Wget triggered a GCC bug; I never installed the (ugly) workaround because later versions of GCC fixed the bug. Also note that there is a technical problem with your patch (if my reading of it is correct): it unconditionally turns on debugging, disregarding the command-line options. Is it possible to save the old optimization options, turn off debugging, and restore the old options? (Borland C seems to support some sort of #pragma push to achieve that effect.) There are other possibilities, too: * Change the Makefile to compile the offending files without optimization, or with a lesser optimization level. Ideally this would be done by configure.bat if it detects the broken compiler version. * Change the Makefile to simply not use optimization by default. This is suboptimal, but would not be a big problem for Wget in practice -- the person creating the binaries would use optimization in his build, which means most people would still have access to an optimized Wget. i don't really like these two options and i don't think they're necessary when there is a freely downloadable microsoft compiler which works perfectly for us. Not yet, but I will certainly. Nevertheless, I think the point is the continue to support existing installation if possble issue, after all VC6 is not free either, and at least one newer commercial VC version has been reported to compile without problems. Those, however, certainly don't support Win95, probably don't Win98/ME or/and NT4 either (didn't yet check though). You mean that you cannot use later versions of C++ to produce Win95/Win98/NT4 binaries? I'd be very surprised if that were the case! yes, this would be very weird. Personally I feel wget should try to still support that not-so-old compiler platform if possible, Sure, but in this case some of the burden falls on the user of the obsolete platform: he has to turn off optimization to avoid a bug in his compiler. That is not entirely unacceptable. i totally agree with hrvoje here. in the worst case, we can add an entry in the FAQ explaining how to compile wget with those buggy versions of microsoft cc. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
hi wgetters ! a while ago, i wrote: [1] wget spans hosts when it shouldn't: it looks like this behaviour is by design, but it should be documented. [2] wget seems to choke on directories that start with a dot. i guess it thinks they are references to external pages and does not download links containing such directory names. it turned out that the site in question is excluding robots, so wget behaves correctly. sorry for the false bug report and for overlooking the obvious :) [3] wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this feature request has not been commented on yet. do think it might be useful ? best regards, jörn -- Jörn Nettingsmeier, EDV-Administrator Institut für Politikwissenschaft Universität Duisburg-Essen, Standort Duisburg Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Jörn Nettingsmeier [EMAIL PROTECTED] writes: [3] wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this feature request has not been commented on yet. do think it might be useful ? I think it's very useful, but so far no one has volunteered to work on it.
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Hrvoje Niksic wrote: Jörn Nettingsmeier [EMAIL PROTECTED] writes: [3] wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this feature request has not been commented on yet. do think it might be useful ? I think it's very useful, but so far no one has volunteered to work on it. maybe a student in our project is interested to implement it, if not, i'll look into it next week. -- Jörn Nettingsmeier, EDV-Administrator Institut für Politikwissenschaft Universität Duisburg-Essen, Standort Duisburg Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Jörn Nettingsmeier [EMAIL PROTECTED] writes: wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this feature request has not been commented on yet. do think it might be useful ? I think it's very useful, but so far no one has volunteered to work on it. maybe a student in our project is interested to implement it, if not, i'll look into it next week. It shouldn't be too hard. You would need to implement a CSS parser, and a corresponding get_urls_css function that extracted the URLs from the CSS source. (I believe both would be much much simpler than the corresponding HTML counterparts.) Finally modify the code in recur.c to call get_urls_css for CSS files, the same way it calls get_urls_html for HTML's. convert_links might need additional work for CSS, but it should also be straightforward.
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Hrvoje Niksic wrote: Jörn Nettingsmeier [EMAIL PROTECTED] writes: wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites. this feature request has not been commented on yet. do think it might be useful ? I think it's very useful, but so far no one has volunteered to work on it. maybe a student in our project is interested to implement it, if not, i'll look into it next week. It shouldn't be too hard. You would need to implement a CSS parser, and a corresponding get_urls_css function that extracted the URLs from the CSS source. (I believe both would be much much simpler than the corresponding HTML counterparts.) Finally modify the code in recur.c to call get_urls_css for CSS files, the same way it calls get_urls_html for HTML's. convert_links might need additional work for CSS, but it should also be straightforward. the same parser code might also work for urls in javascript. as it is now, mouse-over effects with overlay images don't work, because the second file is not retrieved. if we can come up with a good heuristics to guess urls, it should work in both cases. -- Jörn Nettingsmeier, EDV-Administrator Institut für Politikwissenschaft Universität Duisburg-Essen, Standort Duisburg Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736
Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...
Jörn Nettingsmeier [EMAIL PROTECTED] writes: the same parser code might also work for urls in javascript. as it is now, mouse-over effects with overlay images don't work, because the second file is not retrieved. if we can come up with a good heuristics to guess urls, it should work in both cases. I'm not sure that a CSS parser would really be useful for JavaScript. Supporting JavaScript URLs in HTML and elsewhere would require some more heuristics which is IMHO orthogonal to CSS support.
RE: wget 1.10 alpha 2
From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] the patch you've posted is really such an ugly workaround (shame on microsoft Exactly the same opinion here. Please don't misunderstand me, personally for most of my work on windows I use cygnus (including wget) anyway. However there are still lots of people using Windows NT 4 or even win95/win98, with old compilers, where the compilation won't work without the patch. Even if we place a comment in the source file or the windows/Readme many of those will be discouraged, say those who do have a compiler but aren't really developers (yet) (for example first year CS students with old lab computer compilers). I suppose we could leave that stuff present but commented out, and print a warning when configure.bat --msvc is called. Maybe we could even make that warning conditionally (run cl.exe, use the dos/windows find.exe in order to check the output, if 12.00 echo warning) but that would be even more hacky. have you tried the microsoft visual c++ toolkit 2003? maybe it works. you can download it for free at the following URL: http://msdn.microsoft.com/visualc/vctoolkit2003/ Not yet, but I will certainly. Nevertheless, I think the point is the continue to support existing installation if possble issue, after all VC6 is not free either, and at least one newer commercial VC version has been reported to compile without problems. Those, however, certainly don't support Win95, probably don't Win98/ME or/and NT4 either (didn't yet check though). Personally I feel wget should try to still support that not-so-old compiler platform if possible, even if there are other options, either the direct successor (current VC) or not (free alternatives like cygnus, mingw and borland compilers), in order to keep the development process easily accessible to old installations, in order to have more choices for everybody. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
Re: wget 1.10 alpha 2
Herold Heiko [EMAIL PROTECTED] writes: However there are still lots of people using Windows NT 4 or even win95/win98, with old compilers, where the compilation won't work without the patch. Even if we place a comment in the source file or the windows/Readme many of those will be discouraged, say those who do have a compiler but aren't really developers (yet) (for example first year CS students with old lab computer compilers). From my impressions of the Windows world, non-developers won't touch source code anyway -- they will simply use the binary. The really important thing is to make sure that the source works for the person likely to create the binaries, in this case you. Ideally he should have access to the latest compiler, so we don't have to cater to brokenness of obsolete compiler versions. This is not about Microsoft bashing, either: at at least one point Wget triggered a GCC bug; I never installed the (ugly) workaround because later versions of GCC fixed the bug. Also note that there is a technical problem with your patch (if my reading of it is correct): it unconditionally turns on debugging, disregarding the command-line options. Is it possible to save the old optimization options, turn off debugging, and restore the old options? (Borland C seems to support some sort of #pragma push to achieve that effect.) There are other possibilities, too: * Change the Makefile to compile the offending files without optimization, or with a lesser optimization level. Ideally this would be done by configure.bat if it detects the broken compiler version. * Change the Makefile to simply not use optimization by default. This is suboptimal, but would not be a big problem for Wget in practice -- the person creating the binaries would use optimization in his build, which means most people would still have access to an optimized Wget. Not yet, but I will certainly. Nevertheless, I think the point is the continue to support existing installation if possble issue, after all VC6 is not free either, and at least one newer commercial VC version has been reported to compile without problems. Those, however, certainly don't support Win95, probably don't Win98/ME or/and NT4 either (didn't yet check though). You mean that you cannot use later versions of C++ to produce Win95/Win98/NT4 binaries? I'd be very surprised if that were the case! Personally I feel wget should try to still support that not-so-old compiler platform if possible, Sure, but in this case some of the burden falls on the user of the obsolete platform: he has to turn off optimization to avoid a bug in his compiler. That is not entirely unacceptable.
Re: wget 1.10 alpha 1
Hi, Does anybody know if the security vulnerabilities CAN-2004-1487 and CAN-2004-1488 will be fixed in the new version ? There seems to be at least some truth in the reports (ignore the insulting tone of the reports). http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1487 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1488 Karsten
Re: wget 1.10 alpha 1
Karsten Hopp [EMAIL PROTECTED] writes: Does anybody know if the security vulnerabilities CAN-2004-1487 and CAN-2004-1488 will be fixed in the new version ? Yes on both counts. There seems to be at least some truth in the reports (ignore the insulting tone of the reports). http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1487 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1488 I've read them. The first one is fairly improbable because it requires special DNS setup for .. to resolve to an IP address. The second one poses a real problem, which I simply never considered. I'm not sure if either issue is critical enough to warrant a 1.9.2 release. The proximity of 1.10, which fixes both problems, makes it unnecessary.
Re: wget 1.10 alpha 2
Hrvoje Niksic [EMAIL PROTECTED] writes: [EMAIL PROTECTED] writes: If possible, it seems preferable to me to use the platform's C library regex support rather than make wget dependent on another library... Note that some platforms don't have library support for regexps, so we'd have to bundle anyway. Oh, and POSIX regexps don't support -- and never will -- non-greedy quantifiers, which are perhaps the most useful single additions of Perl 5 regexps. Incidentally, regex.c bundled with GNU Emacs supports them, along with non-capturing (shy) groups, another very useful feature.
Re: wget 1.10 alpha 2
On Wednesday 13 April 2005 07:39 am, Herold Heiko wrote: With MS Visual Studio 6 still needs attached patch in order to compile (disable optimization for part of http.c and retr.c if cl.exe version =12). Windows msvc test binary at http://xoomer.virgilio.it/hherold/ hi herold, the patch you've posted is really such an ugly workaround (shame on microsoft and their freaking compilers) that i am not very willing to merge it into our cvs. have you tried the microsoft visual c++ toolkit 2003? maybe it works. you can download it for free at the following URL: http://msdn.microsoft.com/visualc/vctoolkit2003/ -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 1
[EMAIL PROTECTED] (Steven M. Schweda) writes: #define VERSION_STRING 1.10-alpha1_sms1 Was there any reason to do this with a source module instead of a simple macro in a simple header file? At some point that approach made it easy to read or change the version, as the script dist-wget does. But I'm sure there are other ways to do it, too. Was there any reason to use '#include config.h' instead of '#include config.h'? Yes. The idea is that you can build in a separate directory and have the compiler find the build directory's config.h instead of a config.h previously configured in the source directory. Quoting Autoconf manual: Use `#include config.h' instead of `#include config.h', and pass the C compiler a `-I.' option (or `-I..'; whichever directory contains `config.h'). That way, even if the source directory is configured itself (perhaps to make a distribution), other build directories can also be configured without finding the `config.h' from the source directory.
RE: wget 1.10 alpha 2
With MS Visual Studio 6 still needs attached patch in order to compile (disable optimization for part of http.c and retr.c if cl.exe version =12). Windows msvc test binary at http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 13, 2005 12:36 AM To: wget@sunsite.dk; [EMAIL PROTECTED] Cc: Johannes Hoff; Leonid Petrov; Doug Kaufman; Tobias Tiederle; Jim Wright; garycao; Steven M.Schweda Subject: wget 1.10 alpha 2 dear friends, i have just released the second alpha version of wget 1.10: [snip] 20050413.diff Description: Binary data
Re: wget 1.10 alpha 2
[EMAIL PROTECTED] writes: If possible, it seems preferable to me to use the platform's C library regex support rather than make wget dependent on another library... Note that some platforms don't have library support for regexps, so we'd have to bundle anyway.
Re: wget 1.10 alpha 1
From: Mauro Tortonesi [EMAIL PROTECTED] [...] i think that if you want your patches to be merged in our CVS, you should follow the official patch submission procedure (that is, posting your patches to the wget-patches AT sunsite DOT dk mailing list. each post should include a brief comment about what the patch does, and especially why it does so). this would save a lot of time to me and hrvoje and would definitely speed up the merging process. [...] Perhaps. I'll give it a try. Also, am I missing something obvious, or should the configure script (as in, To configure Wget, run the configure script provided with the distribution.) be somewhere in the CVS source? I see many of its relatives, but not the script itself. And I'm just getting started, but is there any good reason for the extern variables output_stream and output_stream_regular not to be declared in some header file? Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget 1.10 alpha 1
On Tue, 12 Apr 2005, Steven M. Schweda wrote: Also, am I missing something obvious, or should the configure script (as in, To configure Wget, run the configure script provided with the distribution.) be somewhere in the CVS source? I see many of its relatives, but not the script itself. You can use Makefile.cvs (i.e. make -f Makefile.cvs), which will run autoheader and autoconf. The autoheader command creates src/config.h.in and the autoconf command creates configure from configure.in. I usually just run autoheader and autoconf directly. You need to have Autoconf and m4 installed. Doug -- Doug Kaufman Internet: [EMAIL PROTECTED]
Re: wget 1.10 alpha 1
[EMAIL PROTECTED] (Steven M. Schweda) writes: Also, am I missing something obvious, or should the configure script (as in, To configure Wget, run the configure script provided with the distribution.) be somewhere in the CVS source? The configure script is auto-generated and is therefore not in CVS. To get it, run autoconf. See the file README.cvs. And I'm just getting started, but is there any good reason for the extern variables output_stream and output_stream_regular not to be declared in some header file? No good reason that I can think of.
Re: Wget error
On Tuesday 12 April 2005 06:17 pm, Jeanne McIlvain wrote: Hi! I attempted to download wget onto my mac. I was disappointed to find that it would not work. I thought that I read it was applicable to macs, but am I wrong? Please let me know, Thank you so much. - please respond to [EMAIL PROTECTED] did you download the source tarball and compile it? which version of wget are you using? which version of mac os? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.10 alpha 1
From: Hrvoje Niksic [EMAIL PROTECTED] Also, am I missing something obvious, or should the configure script (as in, To configure Wget, run the configure script provided with the distribution.) be somewhere in the CVS source? The configure script is auto-generated and is therefore not in CVS. To get it, run autoconf. See the file README.cvs. Sorry for the stupid question. I was reading the right document but then I got distracted and failed to get back to it. Thanks for the quick, helpful responses. And I'm just getting started, but is there any good reason for the extern variables output_stream and output_stream_regular not to be declared in some header file? No good reason that I can think of. I'm busy segregating all/most of the VMS-specific stuff into a vms directory, to annoy the normal folks less. Currently, I have output_stream, output_stream_regular, and total_downloaded_bytes in (a new) main.h, but I could do something else if there's a better plan. Rather than do something similar for version_string, I just transformed version.c into version.h, which (for the moment) contains little other than: #define VERSION_STRING 1.10-alpha1_sms1 Was there any reason to do this with a source module instead of a simple macro in a simple header file? Was there any reason to use '#include config.h' instead of '#include config.h'? This hosed my original automatic dependency generation, but a work-around was easy enough. It just seemed like a difference from all the other non-system inclusions with no obvious (to me) reason. Currently, I'm working from a CVS collection taken on 11 April. Assuming I can get this stuff organized in the next few days or so, what would be the most convenient code base to use? Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: wget 1.9.1 with large DVD.iso files
Sanjay Madhavan [EMAIL PROTECTED] writes: wget 1.9.1 fails when trying to download a very large file. The download stopped in between and attempting to resume shows a negative sized balance to be downloaded. e.g.ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso 3284710 KB I read somewhere that it is due to the fact that internally the size is being stored as signed integers and hence the numbers wrap around giving negative sizes for large (DVD sized files) That is correct. But this problem has been fixed in the current CVS. If you know how to use CVS, you can download it (the instructions are at http://wget.sunsite.dk/) and give it a spin. Downloading that file should work in that version: {mulj}[~]$ wget ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso --11:44:46-- ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso = `SUSE-Linux-9.2-FTP-DVD.iso' Resolving ftp.solnet.ch... 212.101.4.244 Connecting to ftp.solnet.ch|212.101.4.244|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /mirror/SuSE/i386/9.2/iso ... done. == PASV ... done.== RETR SUSE-Linux-9.2-FTP-DVD.iso ... done. Length: 3,363,543,040 (3.1G) (unauthoritative) 0% [ ] 146,464 37.06K/s ETA 24:32:50 ...
Re: wget 1.9.1 with large DVD.iso files
I may run into this in the future. What is the threshold for large files failing on the -current version of wget??? I'm not expecting to d/l anything over 200MB, but is that even too large for it? Sorry to threadjack, but it seemed an appropiate question... Bryan On Apr 11, 2005 2:46 AM, Hrvoje Niksic [EMAIL PROTECTED] wrote: Sanjay Madhavan [EMAIL PROTECTED] writes: wget 1.9.1 fails when trying to download a very large file. The download stopped in between and attempting to resume shows a negative sized balance to be downloaded. e.g.ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso 3284710 KB I read somewhere that it is due to the fact that internally the size is being stored as signed integers and hence the numbers wrap around giving negative sizes for large (DVD sized files) That is correct. But this problem has been fixed in the current CVS. If you know how to use CVS, you can download it (the instructions are at http://wget.sunsite.dk/) and give it a spin. Downloading that file should work in that version: {mulj}[~]$ wget ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso --11:44:46-- ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso = `SUSE-Linux-9.2-FTP-DVD.iso' Resolving ftp.solnet.ch... 212.101.4.244 Connecting to ftp.solnet.ch|212.101.4.244|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /mirror/SuSE/i386/9.2/iso ... done. == PASV ... done.== RETR SUSE-Linux-9.2-FTP-DVD.iso ... done. Length: 3,363,543,040 (3.1G) (unauthoritative) 0% [ ] 146,464 37.06K/s ETA 24:32:50 ...
Re: wget 1.9.1 with large DVD.iso files
Bryan [EMAIL PROTECTED] writes: I may run into this in the future. What is the threshold for large files failing on the -current version of wget??? The threshold is 2G (2147483648 bytes). I'm not expecting to d/l anything over 200MB, but is that even too large for it? That's not too large. OP's file was over 3G.
Re: wget follow-excluded patch
Tobias Tiederle [EMAIL PROTECTED] writes: let's say you have the following structure: index.html |-cool.html | |-page1.html | |-page2.html | |- ... | |-crap.html |-page1.html |-page2.html now you want to download the whole structure, but you want to exclude the crap (with -R/A or nice regex). If you look at recur.c, crap.html is downloaded (and deleted), but all the pages linked in crap.html will be downloaded as well. With the option I included, all the crap will be totally ignored. I don't know how to achieve this beahaviour with the current options. You can't. -R/-A were never meant to be used that way -- witness the FTP code, where they're not applied to directories either. (In this sense HTML files are directories of a kind.) Maybe we could repurpose -I/-X so they can apply to HTML files and be used to ignore whole sub-hierarchies of the site? Although a bit unorthodox, that would be very much within the jurisdiction of those options.