subject:"Re\: Wget"

RE: Wget patches for .files

2005-08-19 Thread Tony Lewis

Mauro Tortonesi wrote: 

 this is a very interesting point, but the patch you mentioned above uses
the
 LIST -a FTP command, which AFAIK is not supported by all FTP servers.

As I recall, that's why the patch was not accepted. However, it would be
useful if there were some command line option to affect the LIST parameters.
Perhaps something like:

wget ftp://ftp.somesite.com --ftp-list=-a

Tony

Re: Wget should like the file

2005-08-17 Thread Hrvoje Niksic

Behdad Esfahbod [EMAIL PROTECTED] writes:

 It happened to me to unintentionally run two commands:

   wget -b -c http://some/file.tar.gz

 and hours later I figured out that the 1GB that I've downloaded
 is useless since two wget processes have been downloading the
 same data twice and appending to the same file. :(

 So, wget should lock the file for writing, that seems like it
 doesn't.

Thanks for the report.  I believe this problem is fixed in Wget 1.10,
where the second Wget process would write to file.tar.gz.1, using
O_EXCL to make sure that two processes are not clobbering the same
file.

Re: wget 1.9.1 worked for 4.2G wrapped file

2005-08-14 Thread Hrvoje Niksic

Linda Walsh [EMAIL PROTECTED] writes:

 I noticed after my post in the archives that this bug is fixed in
 1.10.

 Now if I can just get the server-ops to fix their CVS server, that'd
 be great -- I've checked out CVS projects from other sites and not
 had inbound TCP attempts to some 'auth' service. ;-/:-)

Note that use of CVS (in fact svn) is not required -- simply get
wget-1.10.tar.gz from the nearest GNU mirror and compile that.

Re: wget multiple downloads

2005-08-08 Thread Mauro Tortonesi

On Wednesday 03 August 2005 08:14 am, dan1 wrote:
 Hello.

 I am using wget since a long time now. I like it very much.

 However I have 2 requests of enhancements that I think to be important and
 very useful:

 1. There should be a 'download acceleration' mode that triggers several
 downloads at the same time for the same file. This accelerates a lot the
 download for sites where the bandwidth is limited per connection (not in
 purpose), because of the routers in between. e.g. 'accel' program does this
 job very well. I am using that one because of this feature lack in wget.

last month hrvoje and i have discussed about whether to implement this feature 
and we agreed that it would be overkill for a (supposedly) simple command 
line tool like wget.

 2. wget should be completely HTTP1.1 compliant. Now it just uses the
 HTTP1.0 compatibility of the protocol, but the HTTP1.1 is important to be
 followed. Once I have been annoyed because of that, I don't remember why
 now.

wget 2.0 will be completely HTTP 1.1 compliant.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute for Human  Machine Cognition  http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10, issues with IPv6 on AIX 5.1

2005-08-04 Thread Hrvoje Niksic

Thanks for the report.  The problem seems to come from Wget's use of
AI_ADDRCONFIG hint to getaddrinfo.  Wget 1.10.1 will not use that
hint.

RE: wget a file with long path on Windows XP

2005-07-21 Thread Tony Lewis

PoWah Wong wrote: 

 The login page is:
 http://safari.informit.com/?FPI=uicode=

 How to figure out the login command?

 These two commands do not work:

 wget --save-cookies cookies.txt http://safari.informit.com/?FPI= [snip]
 wget --save-cookies cookies.txt
http://safari.informit.com/?FPI=uicode=/login.php? [snip]

When trying to recreate a form in wget, you have to send the data the server
is expecting to receive to the location the server is expecting to receive
it. You have to look at the login page for the login form and recreate it.
In your browser, view the source to http://safari.informit.com/?FPI=uicode=
and you will find the form that appears below. Note that I stripped out
formatting information for the table that contains the form and reformatted
what was left to make it readable.

form action=JVXSL.asp method=post
  input type=hidden name=s value=1
  input type=hidden name=o value=1
  input type=hidden name=b value=1
  input type=hidden name=t value=1
  input type=hidden name=f value=1
  input type=hidden name=c value=1
  input type=hidden name=u value=1
  input type=hidden name=r value=
  input type=hidden name=l value=1
  input type=hidden name=g value=
  input type=hidden name=n value=1
  input type=hidden name=d value=1
  input type=hidden name=a value=0
  input tabindex=1 name=usr id=usr type=text value= size=12
  input name=pwd id=pwd tabindex=1 type=password value=
size=12
  input type=checkbox tabindex=1 name=savepwd id=savepwd value=1
  input type=image name=Login src=images/btn_login.gif alt=Login
width=40 height=16 border=0 tabindex=1 align=absmiddle
/form

Note that the server expects the data to be posted to JVXSL.asp and that
there are a bunch of fields that must be supplied in order for the server to
process the login request. In addition, the two fields you supply are called
usr and pwd. So your first wget command line will look something like
this:

wget --save-cookies cookies.txt http://safari.informit.com/JVXSL.asp;
--post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0usr=wong_powa
[EMAIL PROTECTED]pwd=123savepwd=1

Hope that helps!

Tony

RE: wget a file with long path on Windows XP

2005-07-21 Thread PoWah Wong

I can save cookies but still has wgetting a blank web
page.  The web page url is copied from the url
displayed in the web browser.
These are the logs.

C:\Program Files\wget\wget --save-cookies
cookies.txt http://safari.informit.com/JVXSL.asp;
--post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0[EMAIL 
PROTECTED]pwd=123savepwd=1
--21:41:38--  http://safari.informit.com/JVXSL.asp
   = `JVXSL.asp'
Resolving safari.informit.com... 193.194.158.208
Connecting to
safari.informit.com|193.194.158.208|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12,149 (12K) [text/html]

100%[]
12,14948.76K/s

21:41:39 (48.63 KB/s) - `JVXSL.asp' saved
[12149/12149]


First wget:
C:\Program Files\wget\wget --load-cookies
cookies.txt
http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network.ciscoioss=1b=1f=1t
=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6/ciscockbk-CHP-1
--21:43:31-- 
http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview
=g=catid=itbooks.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367
-6/ciscockbk-CHP-1
   =
[EMAIL 
PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network
.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP-1'
Resolving safari.informit.com... 193.194.158.208
Connecting to
safari.informit.com|193.194.158.208|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s

21:43:32 (0.00 B/s) -
[EMAIL 
PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks
.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP
-1' saved [0/0]

Second wget:
C:\Program Files\wget\wget --load-cookies
cookies.txt --keep-session-cookies
http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network
.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6/ciscockbk-CHP-1
--21:45:31-- 
http://safari.informit.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview
=g=catid=itbooks.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367
-6/ciscockbk-CHP-1
   =
[EMAIL 
PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks.network
.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP-1'
Resolving safari.informit.com... 193.194.158.208
Connecting to
safari.informit.com|193.194.158.208|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s

21:45:31 (0.00 B/s) -
[EMAIL 
PROTECTED]mode=sectionsortKey=ranksortOrder=descview=g=catid=itbooks
.network.ciscoioss=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0xmlid=0-596-00367-6%2Fciscockbk-CHP
-1' saved [0/0]

I am not subscribed, please cc'd in replies to my
post. Thanks. 

--- Tony Lewis [EMAIL PROTECTED] wrote:

 PoWah Wong wrote: 
 
  The login page is:
  http://safari.informit.com/?FPI=uicode=
 
  How to figure out the login command?
 
 [snip]

 So your first wget command line
 will look something like
 this:
 
 wget --save-cookies cookies.txt
 http://safari.informit.com/JVXSL.asp;

--post-data=s=1o=1b=1t=1f=1c=1u=1r=l=1g=n=1d=1a=0usr=wong_powa
 [EMAIL PROTECTED]pwd=123savepwd=1
 
 Hope that helps!
 
 Tony







___
Post your free ad now! http://personals.yahoo.ca

Re: wget a file with long path on Windows XP

2005-07-13 Thread Frank McCown

This sounds like a difficult page to download because they may be using 
cookies or session variables.  I'm not sure the best way to proceed, but 
I would look at the wget documentation about cookies.  I think you may 
have to save the cookies that are generated by the login page and use 
--load-cookie to get the page you are after.


By the way, if you are only after a single page, why not just save it 
using the browser?


Frank


PoWah Wong wrote:

The website is actually www.informit.com.
It require logging in at
https://secure.safaribooksonline.com/promo.asp?code=ITT03portal=informita=0
After logging in, then the website becomes similar to
booksonline.com which I edit slightly.
My public library's electronic access which also
require logging in.


--- Frank McCown [EMAIL PROTECTED] wrote:



Putting quotes around the url got rid of your
Invalid parameter errors.

I just tried accessing the url you are trying to
wget and received an 
http 500 response.  I also tried accessing 
http://proquest.booksonline.com/ and never got a

response.

According to your output, wget got back a 0 length
response.  I would 
check your web server and make sure it is working

properly.

Frank


PoWah Wong wrote:


I put quotes around the url, but it still does


not


work.

C:\bookC:\Program Files\wget\wget.exe




http://proquest.booksonline.com/?x=1mode=sectionso

rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r


=o=1n=1d=1p=1a=0page=0
--22:45:26-- 




http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi

ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0


  =


[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc

h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


Resolving proquest.booksonline.com...


193.194.158.201


Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

   [ = 


  


  ] 0 --.--K/s

22:45:27 (0.00 B/s) -




[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160

76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


saved [0/0]


C:\bookC:\Program Files\wget\wget.exe




http://proquest.booksonline.com/?x=1mode=sectionso

rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r


=o=1n=1d=1p=1a=0page=0
--22:46:59-- 




http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi

ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0


  =


[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc

h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


Resolving proquest.booksonline.com...


193.194.158.201


Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

   [ = 


  


  ] 0 --.--K/s

22:46:59 (0.00 B/s) -




[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160

76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


saved [0/0]


C:\bookC:\Program Files\wget\wget.exe




http://proquest.booksonline.com/?x=1%26mode=section%

26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1

%26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0

--22:47:45-- 




http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder=

asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r


=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0
  =


[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc

h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


Resolving proquest.booksonline.com...


193.194.158.201


Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

   [ = 


  


  ] 0 --.--K/s

22:47:46 (0.00 B/s) -




[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160

76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'


saved [0/0]


--- Frank McCown [EMAIL PROTECTED] wrote:




I think you need to put quotes around the url.


PoWah Wong wrote:



The file I want to get is




http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0;


I opened an MSDOS console on Windows XP.

I tried:
C:\Program Files\wget\wget.exe

Re: wget a file with long path on Windows XP

2005-07-12 Thread Frank McCown


Putting quotes around the url got rid of your Invalid parameter errors.

I just tried accessing the url you are trying to wget and received an 
http 500 response.  I also tried accessing 
http://proquest.booksonline.com/ and never got a response.


According to your output, wget got back a 0 length response.  I would 
check your web server and make sure it is working properly.


Frank


PoWah Wong wrote:

I put quotes around the url, but it still does not
work.

C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1mode=sectionso
rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r
=o=1n=1d=1p=1a=0page=0
--22:45:26-- 
http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi

ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0

   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s


22:45:27 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1mode=sectionso
rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r
=o=1n=1d=1p=1a=0page=0
--22:46:59-- 
http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi

ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0

   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s


22:46:59 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1%26mode=section%
26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1
%26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0
--22:47:45-- 
http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder=

asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r
=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0
   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s


22:47:46 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


--- Frank McCown [EMAIL PROTECTED] wrote:



I think you need to put quotes around the url.


PoWah Wong wrote:


The file I want to get is




http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0;



I opened an MSDOS console on Windows XP.

I tried:
C:\Program Files\wget\wget.exe




http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcod

e=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0

--05:33:34-- 
http://proquest.booksonline.com/JVXSL.asp?x=1

  = [EMAIL PROTECTED]'
Resolving proquest.booksonline.com...


193.194.158.201


Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

   [ = 


  


  ] 0 --.--K/s

05:33:34 (0.00 B/s) - [EMAIL PROTECTED]' saved [0/0]

Invalid parameter - =section
'sortKey' is not recognized as an internal or


external


command,
operable program or batch file.
'sortOrder' is not recognized as an internal or
external command,
operable program or batch file.
'view' is not recognized as an internal or


external


command,
operable program or batch

Re: wget a file with long path on Windows XP

2005-07-11 Thread PoWah Wong

I put quotes around the url, but it still does not
work.

C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1mode=sectionso
rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r
=o=1n=1d=1p=1a=0page=0
--22:45:26-- 
http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi
ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0

   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s

22:45:27 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1mode=sectionso
rtKey=titlesortOrder=ascview=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r
=o=1n=1d=1p=1a=0page=0
--22:46:59-- 
http://proquest.booksonline.com/?x=1mode=sectionsortKey=titlesortOrder=ascvi
ew=xmlid=0-321-16076-2/ch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0

   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s

22:46:59 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


C:\bookC:\Program Files\wget\wget.exe
http://proquest.booksonline.com/?x=1%26mode=section%
26sortKey=title%26sortOrder=asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1
%26f=1%26t=1%26c=1%26u=1%26r=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0
--22:47:45-- 
http://proquest.booksonline.com/?x=1%26mode=section%26sortKey=title%26sortOrder=
asc%26view=%26xmlid=0-321-16076-2/ch03lev1sec1%26g=%26catid=%26s=1%26b=1%26f=1%26t=1%26c=1%26u=1%26r
=%26o=1%26n=1%26d=1%26p=1%26a=0%26page=0
   =
[EMAIL 
PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-16076-2%2Fc
h03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
Resolving proquest.booksonline.com... 193.194.158.201
Connecting to
proquest.booksonline.com|193.194.158.201|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]

[ = 
   ] 0 --.--K/s

22:47:46 (0.00 B/s) -
[EMAIL PROTECTED]mode=sectionsortKey=titlesortOrder=ascview=xmlid=0-321-160
76-2%2Fch03lev1sec1g=catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0'
saved [0/0]


--- Frank McCown [EMAIL PROTECTED] wrote:

 I think you need to put quotes around the url.
 
 
 PoWah Wong wrote:
  The file I want to get is
 

http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcode=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0;
  
  
  I opened an MSDOS console on Windows XP.
  
  I tried:
  C:\Program Files\wget\wget.exe
 

http://proquest.booksonline.com/JVXSL.asp?x=1mode=sectionsortKey=ranksortOrder=descview=bookxmlid=0-321-16076-2/ch02g=srchText=object+orientedcod
 

e=h=m=l=1catid=s=1b=1f=1t=1c=1u=1r=o=1n=1d=1p=1a=0page=0
  --05:33:34-- 
  http://proquest.booksonline.com/JVXSL.asp?x=1
 = [EMAIL PROTECTED]'
  Resolving proquest.booksonline.com...
 193.194.158.201
  Connecting to
  proquest.booksonline.com|193.194.158.201|:80...
  connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 0 [text/html]
  
  [ = 

 ] 0 --.--K/s
  
  05:33:34 (0.00 B/s) - [EMAIL PROTECTED]' saved [0/0]
  
  Invalid parameter - =section
  'sortKey' is not recognized as an internal or
 external
  command,
  operable program or batch file.
  'sortOrder' is not recognized as an internal or
  external command,
  operable program or batch file.
  'view' is not recognized as an internal or
 external
  command,
  operable program or batch file.
  'xmlid' is not recognized as an internal or
 external
  command,
  operable program or batch file.
  'g' is not recognized as an internal or external
  command,
  operable program or batch file.
  'srchText' is not recognized as an internal or
  external command,
  operable program or batch file.
  'code' is not recognized as an internal or
 external

RE: wget 1.10.1 beta 1

2005-07-07 Thread Herold Heiko

Windows MSVC test binary at
http://xoomer.virgilio.it/hherold/

Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 06, 2005 11:07 PM
 To: wget@sunsite.dk
 Subject: wget 1.10.1 beta 1
 
 
 
 dear friends,
 
 i have just released the first beta of wget 1.10.1:
 
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10.1-beta1.tar.gz
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10.1-beta1.tar.bz2
 
 you are encouraged to download the tarballs, test if the code 
 works properly
 and report any bug you find.
 
 
 -- 
 Aequam memento rebus in arduis servare mentem...
 
 Mauro Tortonesi  http://www.tortonesi.com
 
 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 Institute for Human  Machine Cognition  http://www.ihmc.us
 GNU Wget - HTTP/FTP file retrieval tool  
 http://www.gnu.org/software/wget
 Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
 Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget and ASCII mode

2005-06-27 Thread Steven M. Schweda

from Hrvoje Niksic:

 [...]  Unfortunately EOL conversions break
 automatic downloads resumption (REST in FTP),

   Could be true.

  manual resumption (wget -c),

   Could be true.  (I never use wget -c.)

  break timestamping,

   How so?

  and probably would break checksums if we added them.

   You don't have them, and anyone who would be surprised by this should
be directed to the note in the documentation which would explain why.

 Most Wget's users seem to want byte-by-byte copies, because I don't
 remember a single bug report about the lack of ASCII conversions.

   You mean other than the one from the fellow who started this thread?

 The one thing that is surely wrong about my approach is the ';type=a'
 option, which should either be removed or come with a big fat warning
 that it *doesn't* implement the required conversion to native EOL
 convention and that it's provided for the sake of people who need text
 transfers and are willing to invoke dos2unix/unix2dos (or their OS
 equivalent) themselves.

   Interesting.  I'd have made ;type=a work right (which I claim to
have done), and then perhaps included a run-time error or documentation
warning if it were mixed with incompatible options (which I haven't
done).



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget and ASCII mode

2005-06-27 Thread Hrvoje Niksic

[EMAIL PROTECTED] (Steven M. Schweda) writes:

 from Hrvoje Niksic:

 [...]  Unfortunately EOL conversions break
 automatic downloads resumption (REST in FTP),

Could be true.

  manual resumption (wget -c),

Could be true.  (I never use wget -c.)

It's the consequence of EOL conversion affecting file size.

  break timestamping,

How so?

By changing file size, which will appear different than what is
reported by the server and will cause the file to always be downloaded
due to size mismatch.

 Most Wget's users seem to want byte-by-byte copies, because I don't
 remember a single bug report about the lack of ASCII conversions.

 You mean other than the one from the fellow who started this thread?

Yes, sorry.

Re: wget and ASCII mode

2005-06-26 Thread Hrvoje Niksic

[EMAIL PROTECTED] (Steven M. Schweda) writes:

 It does seem a bit odd that no one has noticed this fundamental
 problem until now, but then I missed it, too.

Long ago I intentionally made Wget use binary mode by default and not
muck with line endings because I believed exact data transfer was
important to get right first.  Unfortunately EOL conversions break
automatic downloads resumption (REST in FTP), manual resumption (wget
-c), break timestamping, and probably would break checksums if we
added them.

Most Wget's users seem to want byte-by-byte copies, because I don't
remember a single bug report about the lack of ASCII conversions.

The one thing that is surely wrong about my approach is the ';type=a'
option, which should either be removed or come with a big fat warning
that it *doesn't* implement the required conversion to native EOL
convention and that it's provided for the sake of people who need text
transfers and are willing to invoke dos2unix/unix2dos (or their OS
equivalent) themselves.

Re: Wget and Secure Pages

2005-06-25 Thread Hrvoje Niksic

John Haymaker [EMAIL PROTECTED] writes:

 I am trying to download all pages in my site except secure pages that
 require login.
  
 Problem:  when wget encounters a secure page requiging the user to log in,
 it hangs there for up to an hour.  Then miraculously, it moves on.

By secure pages do you mean https: pages?

Normally Wget has a timeout mechanism that prevents it from hanging
for so long (the default timeout is 15 minutes, but it can be
shortened to 10 seconds or to whatever works for you), but it
sometimes doesn't work for OpenSSL.

Re: wget and ASCII mode

2005-06-25 Thread Steven M. Schweda

 [...]  (The new code does make one potentially risky assumption,
 but it's explained in the comments.)

   The latest code in my patches and in my new 1.9.1d kit (for VMS,
primarily, but not exclusively) removes the potentially risky assumption
(CR and LF in the same buffer), so it should be swell.  I've left it for
someone else to activate the conditional code which would restore CR-LF
line endings on systems where that's preferred.

   It does seem a bit odd that no one has noticed this fundamental
problem until now, but then I missed it, too.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: WGET return status codes

2005-06-19 Thread Zinovy Malkin

Thanks.

From:   Mauro Tortonesi [EMAIL PROTECTED]
Organization:   University of Ferrara
To: [EMAIL PROTECTED]
Subject:Re: WGET return status codes
Date sent:  Sat, 18 Jun 2005 15:33:26 -0500
Copies to:  [EMAIL PROTECTED]

 On Tuesday 14 June 2005 07:06 am, Zinovy Malkin wrote:
  Dear all,

  I'm not sure the address I'm sending this message is appropriate, sorry.
  Could anybody advise me please where can I find the list of the wget
  return status codes.

 at the moment wget status codes are not completely standardized, so you 
 probably don't want to rely on the return status codes to understand why a 
 download failed.

 -- 
 Aequam memento rebus in arduis servare mentem...

 Mauro Tortonesi  http://www.tortonesi.com

 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 Institute for Human  Machine Cognition  http://www.ihmc.us
 GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
 Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
 Ferrara Linux User Group http://www.ferrara.linux.it

--
Dr. Zinovy M. Malkin  e-mail:  [EMAIL PROTECTED]
Head  Tel:  +7-(812)-275-1024
Lab of Space Geodesy and Earth Rotation   Fax:  +7-(812)-275-1119
Institute of Applied Astronomy RAShttp://www.zmalkin.com/
nab. Kutuzova, 10http://www.ipa.nw.ru/PAGE/DEPFUND/GEO/zm/
St. Petersburg 191187
Russia
--

Re: WGET return status codes

2005-06-18 Thread Mauro Tortonesi

On Tuesday 14 June 2005 07:06 am, Zinovy Malkin wrote:
 Dear all,

 I'm not sure the address I'm sending this message is appropriate, sorry.
 Could anybody advise me please where can I find the list of the wget
 return status codes.

at the moment wget status codes are not completely standardized, so you 
probably don't want to rely on the return status codes to understand why a 
download failed.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute for Human  Machine Cognition  http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 problems under AIX

2005-06-16 Thread Jens Schleusener


Hi Hrvoje,


Thanks for the detailed report!


Thanks for your detailed answer ;-)


Jens Schleusener [EMAIL PROTECTED] writes:


1) Only using the configure-option --disable-nls and the C compiler
gcc 4.0.0 the wget-binary builds successfully


I'd be interested in seeing the error log without --disable-nls and/or
with the system compiler.


I will send that logs to your personal mail-address.


although gcc outputs
some compiler warnings like

  convert.c: In function 'convert_all_links':
  convert.c:95: warning: incompatible implicit declaration of built-in
 function 'alloca'


The alloca-declaring magic in config-post.h (taken from the Autoconf
manual) apparently doesn't take into account that GCC wants alloca
declared too.


But simply calling the new wget like

  wget http://www.example.com/

I got always errors like

  --12:36:51--  http://www.example.com/
= `index.html'
Resolving www.example.com... failed: Invalid flags in hints.


This is really bad.  Apparently your version of getaddrinfo is broken
or Wget is using it incorrectly.  Can you intuit which flags cause the
problem?  Depending on the circumstances, Wget uses AI_ADDRCONFIG,
AI_PASSIVE, and/or AI_NUMERICHOST.


Yes, all three seems defined, probably via /usr/include/netdb.h. Here an 
extract of that file:


/* Flag definitions for addrinfo hints in protocol-independent 
name/addr/service service. RFC2133 */

/* Also flag definitions for getipnodebyname  RFC 2553  */
#define AI_CANONNAME0x01/* canonical name to be included in return */
#define AI_PASSIVE  0x02/* prepare return for call to bind() */
#define AI_NUMERICHOST  0x04/* RFC 2553, nodename is a numeric host
   address string */
#define AI_ADDRCONFIG   0x08/* RFC 2553, source address family configured */
#define AI_V4MAPPED 0x10/* RFC 2553, accept v4 mapped addresses */
#define AI_ALL  0x20/* RFC 2553, accept all addresses */
#define AI_DEFAULT  (AI_V4MAPPED | AI_ADDRCONFIG) /* RFC 2553 */

But I have no idea were the error message Invalid flags in hints comes 
from. Directly from wget (probably not) or from system resolver routines?



After some testing I found that using the additional
configure-option --disable-ipv6 solves that problem.


Because it disables IPv6 (and therefore the use of getaddrinfo)
altogether.


Ok, that was not clear to me.


2) Using the additional configure-option --with-ssl=/usr/local/contrib
fails although the openssl (0.9.7g) header files are installed under
/usr/local/contrib/include/openssl/ and the libssl.a under
/usr/local/contrib/lib/.


This is not a standard layout, so the configure script is having
problems with it.  The supported layouts are one of:

* No flags are needed, the includes are found without -Iincludedir,
 and the library gets linked in without the need for -Llibdir.

* The library is installed in $root, which means that includes are in
 $root/include and the libraries in $root/lib.  OpenSSL's own default
 for $root is /usr/local/ssl, which Wget checks for.

To resolve situations like this, Wget should probably support
specifying additional include and library directories separately.
I believe you can work around this by specifying:

./configure CPPFLAGS=-I/usr/local/contrib/include -L/usr/local/contrib/lib

Can you check if that works for you?


That doesn't works (typo ?) better seems

./configure CPPFLAGS=-I/usr/local/contrib/include
LDFLAGS=-L/usr/local/contrib/lib

but that also doesn't solve the described problem.

Also the configure option

  --with-ssl[=SSL-ROOT]

respectively in my case

  --with-ssl=/usr/local/contrib

should probably do that job.

After long trial and error testing I have the impression that the 
configure-script has an error. If I change for e.g. at line 25771


   { ac_try='test -s conftest.$ac_objext'

into

   { ac_try='test -s .libs/conftest.$ac_objext'

the generated test object file will now be found. Therefore also my 
openssl-installation will be found and compiled successfully into wget.



3) Using the native IBM C compiler (CC=cc) instead GNU gcc I got the
compile error

  cc -qlanglvl=ansi -I. -I. -I/opt/include   -DHAVE_CONFIG_H
  -DSYSTEM_WGETRC=\/usr/local/contrib/etc/wgetrc\
  -DLOCALEDIR=\/usr/local/contrib/share/locale\ -O -c main.c
  main.c, line 147.16: 1506-275 (S) Unexpected text ',' encountered.

Simply changing line 147 of src/main.c from

   OPT__PARENT,

to

   OPT__PARENT

let the compile error vanish (sorry, I am not a C expert).

[...]

It's a newer C feature that leaked into Wget -- sorry about that.

I'll try to get these fixed in Wget 1.10.1.


Your patch works.

Greetings

Jens

--
Dr. Jens SchleusenerT-Systems Solutions for Research GmbH
Tel: +49 551 709-2493   Bunsenstr.10
Fax: +49 551 709-2169   D-37073 Goettingen
[EMAIL PROTECTED]  http://www.t-systems.com/

Re: wget 1.10 and ssl

2005-06-16 Thread Hrvoje Niksic

Gabor Z. Papp [EMAIL PROTECTED] writes:

 * Hrvoje Niksic [EMAIL PROTECTED]:

 |  new configure script coming with wget 1.10 does not honour
 |  --with-ssl=/path/to/ssl because at linking conftest only
 |  -I/path/to/ssl/include used, and no -L/path/to/ssl/lib
 | 
 | That is not supposed to happen.  Can you post configure output and/or
 | the relevant part of config.log?
[...]
 Here you find everything: http://gzp.hu/tmp/wget-1.10/

According to config.log, it seems your SSL includes are not in
/pkg/include after all:

configure:25735: looking for SSL libraries in /pkg
configure:25742: checking for includes
configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit 
-I/pkg/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c  -fPIC -DPIC -o 
.libs/conftest.o
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25762: $? = 0
configure:25766: test -z 
 || test ! -s conftest.err
configure:25769: $? = 0
configure:25772: test -s conftest.o
configure:25775: $? = 1
configure: failed program was:
| 
| #include openssl/ssl.h
| #include openssl/x509.h
| #include openssl/err.h
| #include openssl/rand.h
| #include openssl/des.h
| #include openssl/md4.h
| #include openssl/md5.h
| 
configure:25787: result: not found
configure:26031: error: failed to find OpenSSL libraries

Configure will try to link (and use -L$root/lib) only after includes
are shown to be found.

Re: wget 1.10 problems under AIX

2005-06-16 Thread Hrvoje Niksic

Jens Schleusener [EMAIL PROTECTED] writes:

   --12:36:51--  http://www.example.com/
 = `index.html'
 Resolving www.example.com... failed: Invalid flags in hints.

 This is really bad.  Apparently your version of getaddrinfo is broken
 or Wget is using it incorrectly.  Can you intuit which flags cause the
 problem?  Depending on the circumstances, Wget uses AI_ADDRCONFIG,
 AI_PASSIVE, and/or AI_NUMERICHOST.

 Yes, all three seems defined, probably via /usr/include/netdb.h.

Then I am guessing that AIX's getaddrinfo doesn't like AF_UNSPEC
family + AI_ADDRCONFIG hint.  If you use `wget -4
http://www.example.com/', does it then work?

 But I have no idea were the error message Invalid flags in hints
 comes from. Directly from wget (probably not) or from system
 resolver routines?

From the system resolver, which Wget invokes via getaddrinfo.

 That doesn't works (typo ?) better seems

 ./configure CPPFLAGS=-I/usr/local/contrib/include
  LDFLAGS=-L/usr/local/contrib/lib

That's what I meant, sorry.  But that is pretty much what
--with-ssl=/usr/local/include does.  (I misread your original message,
thinking that the OpenSSL includes were in an entirely different
location).

 respectively in my case

--with-ssl=/usr/local/contrib

 should probably do that job.

Yes.  I'd like to see config.log, or the relevant parts thereof, which
should contain errors.

 After long trial and error testing I have the impression that the
 configure-script has an error. If I change for e.g. at line 25771

 { ac_try='test -s conftest.$ac_objext'

 into

 { ac_try='test -s .libs/conftest.$ac_objext'

 the generated test object file will now be found.

But why don't I (and other non-AIX testers) have that problem?  Maybe
Libtool is doing something strange on AIX?

Re: wget 1.10 and ssl

2005-06-16 Thread Hrvoje Niksic

Gabor Z. Papp [EMAIL PROTECTED] writes:

 * Hrvoje Niksic [EMAIL PROTECTED]:

 | According to config.log, it seems your SSL includes are not in
 | /pkg/include after all:

 Sure, they are in /pkg/include/openssl.

You're right.  The Autoconf-generated test is wrong, and I'm trying to
figure out why.

configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit 
-I/pkg/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/pkg/include conftest.c  -fPIC -DPIC -o 
.libs/conftest.o
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25762: $? = 0
configure:25766: test -z 
 || test ! -s conftest.err
configure:25769: $? = 0
configure:25772: test -s conftest.o
configure:25775: $? = 1

Of course there is no conftest.o when the file is specifically
requested in .libs/conftest.o!  However, that's not how it works for
me:

configure:25465: /bin/sh ./libtool /opt/gcc4/bin/gcc -o conftest  -O2 -Wall 
-Wno-implicit   conftest.c -ldl  -lrt  5
mkdir .libs
/opt/gcc4/bin/gcc -o conftest -O2 -Wall -Wno-implicit conftest.c  -ldl -lrt
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25471: $? = 0
configure:25474: test -z || test ! -s conftest.err
configure:25477: $? = 0
configure:25480: test -s conftest
configure:25483: $? = 0
configure:25496: result: yes

I am somewhat baffled by this problem.

Re: wget 1.10 problems under AIX

2005-06-16 Thread Jens Schleusener


Hi Hrvoje,


Jens Schleusener [EMAIL PROTECTED] writes:


  --12:36:51--  http://www.example.com/
= `index.html'
Resolving www.example.com... failed: Invalid flags in hints.


This is really bad.  Apparently your version of getaddrinfo is broken
or Wget is using it incorrectly.  Can you intuit which flags cause the
problem?  Depending on the circumstances, Wget uses AI_ADDRCONFIG,
AI_PASSIVE, and/or AI_NUMERICHOST.


Yes, all three seems defined, probably via /usr/include/netdb.h.


Then I am guessing that AIX's getaddrinfo doesn't like AF_UNSPEC
family + AI_ADDRCONFIG hint.  If you use `wget -4
http://www.example.com/', does it then work?


Works.


But I have no idea were the error message Invalid flags in hints
comes from. Directly from wget (probably not) or from system
resolver routines?



From the system resolver, which Wget invokes via getaddrinfo.



That doesn't works (typo ?) better seems

./configure CPPFLAGS=-I/usr/local/contrib/include
 LDFLAGS=-L/usr/local/contrib/lib


That's what I meant, sorry.  But that is pretty much what
--with-ssl=/usr/local/include does.  (I misread your original message,
thinking that the OpenSSL includes were in an entirely different
location).


respectively in my case

   --with-ssl=/usr/local/contrib

should probably do that job.


Yes.  I'd like to see config.log, or the relevant parts thereof, which
should contain errors.


Here the config.log extract:


configure:25735: looking for SSL libraries in /usr/local/contrib
configure:25742: checking for includes
configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit 
-I/usr/local/contrib/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c 
-DPIC -o .libs/conftest.o

*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25762: $? = 0
configure:25766: test -z
  || test ! -s conftest.err
configure:25769: $? = 0
configure:25772: test -s conftest.o
configure:25775: $? = 1
configure: failed program was:
|
| #include openssl/ssl.h
| #include openssl/x509.h
| #include openssl/err.h
| #include openssl/rand.h
| #include openssl/des.h
| #include openssl/md4.h
| #include openssl/md5.h
|
configure:25787: result: not found
configure:26031: error: failed to find OpenSSL libraries


The reason for the above error is as already written - at least in my case 
using the self compiled libtool version 1.5 - that the configure script 
tests for the non-existing conftest.o instead for the generated and 
existing .libs/conftest.o.


The above line

 configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit
 -I/usr/local/contrib/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c
 -DPIC -o .libs/conftest.o

looks for me (as a configure-layman) a little bit strange (gcc twice?).

Here an corresponding extract of config.log (from the same system) while 
configuring lynx2.8.6dev.13 where I have no such problems (but the 
configure script and the conftest.c file looks different):



configure:8128: checking for openssl include directory
configure:8145: gcc -c -I/usr/local/contrib/include 
-I/usr/local/contrib/include  -D_ACS_COMPAT_CODE  -D_POSIX_C_SOURCE=199506L

conftest.c 5
configure:8148: $? = 0
configure:8151: test -s conftest.o
configure:8154: $? = 0
configure:8163: result: yes



After long trial and error testing I have the impression that the
configure-script has an error. If I change for e.g. at line 25771

{ ac_try='test -s conftest.$ac_objext'

into

{ ac_try='test -s .libs/conftest.$ac_objext'

the generated test object file will now be found.


But why don't I (and other non-AIX testers) have that problem?  Maybe
Libtool is doing something strange on AIX?


I will try to re-compile current used libtool version 1.5 under AIX 5.1 
(may be it was built under AIX 4.3) and compile and use the newest libtool 
(version 1.5.18).


Greetings

Jens

--
Dr. Jens SchleusenerT-Systems Solutions for Research GmbH
Tel: +49 551 709-2493   Bunsenstr.10
Fax: +49 551 709-2169   D-37073 Goettingen
[EMAIL PROTECTED]  http://www.t-systems.com/

Re: wget 1.10 problems under AIX

2005-06-16 Thread Hrvoje Niksic

Jens Schleusener [EMAIL PROTECTED] writes:

 The reason for the above error is as already written - at least in
 my case using the self compiled libtool version 1.5

I don't think the libtool version used on the system makes any
difference (except for a developer at the point of libtoolizing his
program), since Wget uses the libtool code from the release tarball.

 - that the
 configure script tests for the non-existing conftest.o instead for
 the generated and existing .libs/conftest.o.

You are right, but I don't understand why it doesn't happen for me.

 The above line

   configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit
   -I/usr/local/contrib/include  conftest.c 5
   gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c
   -DPIC -o .libs/conftest.o

 looks for me (as a configure-layman) a little bit strange (gcc
 twice?).

GCC is only invoked twice.  The first line is configure telling you
the command it is about run.  The second line is libtool telling you
exactly how it is about to run GCC.

For me there is no `-o .libs/conftest.o', even though I use the same
libtool invocation on my system.

 I will try to re-compile current used libtool version 1.5 under AIX
 5.1 (may be it was built under AIX 4.3) and compile and use the newest
 libtool (version 1.5.18).

Unfortunately I don't think it's going to change anything, as
explained above.  I don't think the people who merely *build* software
are even supposed to have to have libtool installed in the first
place.

Re: wget 1.10 problems under AIX

2005-06-16 Thread Jens Schleusener


Hi,


The above line

  configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit
  -I/usr/local/contrib/include  conftest.c 5
  gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c
  -DPIC -o .libs/conftest.o

looks for me (as a configure-layman) a little bit strange (gcc
twice?).


GCC is only invoked twice.  The first line is configure telling you
the command it is about run.  The second line is libtool telling you
exactly how it is about to run GCC.

For me there is no `-o .libs/conftest.o', even though I use the same
libtool invocation on my system.


Sorry, if I bother you but here I see a difference under AIX 5.1 to for 
e.g. my SuSE 9.3 system:


AIX 5.1 (SSL-ROOT=/usr/local/contrib):
==

configure:25742: checking for includes
configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit
  -I/usr/local/contrib/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/usr/local/contrib/include conftest.c -DPIC
 -o .libs/conftest.o
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25762: $? = 0
configure:25766: test -z
  || test ! -s conftest.err
configure:25769: $? = 0
configure:25772: test -s conftest.o
configure:25775: $? = 1


SuSE 9.3 (SSL-ROOT=/usr):
=A

configure:25742: checking for includes
configure:25756: /bin/sh ./libtool gcc -c  -O2 -Wall -Wno-implicit
  -I/usr/include  conftest.c 5
 gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c  -fPIC -DPIC
  -o .libs/conftest.o
 gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c
  -o conftest.o   /dev/null 21
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require --mode=MODE be specified.
configure:25762: $? = 0
configure:25766: test -z
 || test ! -s conftest.err
configure:25769: $? = 0
configure:25772: test -s conftest.o
configure:25775: $? = 0
configure:25778: result: found

That output is produces from the configure line 25756:

 if  { (eval echo $as_me:$LINENO: \$ac_compile\)  5

The real compiling is done by the next line  25756:

 (eval $ac_compile) 2conftest.er1

The content of $ac_compile seems to be (SuSE)

 /bin/sh ./libtool gcc -c -O2 -Wall -Wno-implicit -I/usr/include conftest.c 5

but that produces under SuSE (Linux) obviously two gcc processes and two 
object files (.libs/conftest.o AND conftest.o) so that the objectfile 
existing test (searching only for conftest.o) is successful. But under AIX 
only one object file (.libs/conftest.o) is generated that the object file 
existing test doesn't find.


Greetings

Jens

--
Dr. Jens SchleusenerT-Systems Solutions for Research GmbH
Tel: +49 551 709-2493   Bunsenstr.10
Fax: +49 551 709-2169   D-37073 Goettingen
[EMAIL PROTECTED]  http://www.t-systems.com/

Re: wget 1.10 problems under AIX

2005-06-15 Thread Hrvoje Niksic

Thanks for the detailed report!

Jens Schleusener [EMAIL PROTECTED] writes:

 1) Only using the configure-option --disable-nls and the C compiler
 gcc 4.0.0 the wget-binary builds successfully

I'd be interested in seeing the error log without --disable-nls and/or
with the system compiler.

 although gcc outputs
 some compiler warnings like

   convert.c: In function 'convert_all_links':
   convert.c:95: warning: incompatible implicit declaration of built-in
  function 'alloca'

The alloca-declaring magic in config-post.h (taken from the Autoconf
manual) apparently doesn't take into account that GCC wants alloca
declared too.

 But simply calling the new wget like

   wget http://www.example.com/

 I got always errors like

   --12:36:51--  http://www.example.com/
 = `index.html'
 Resolving www.example.com... failed: Invalid flags in hints.

This is really bad.  Apparently your version of getaddrinfo is broken
or Wget is using it incorrectly.  Can you intuit which flags cause the
problem?  Depending on the circumstances, Wget uses AI_ADDRCONFIG,
AI_PASSIVE, and/or AI_NUMERICHOST.

 After some testing I found that using the additional
 configure-option --disable-ipv6 solves that problem.

Because it disables IPv6 (and therefore the use of getaddrinfo)
altogether.

 2) Using the additional configure-option --with-ssl=/usr/local/contrib
 fails although the openssl (0.9.7g) header files are installed under
 /usr/local/contrib/include/openssl/ and the libssl.a under
 /usr/local/contrib/lib/.

This is not a standard layout, so the configure script is having
problems with it.  The supported layouts are one of:

* No flags are needed, the includes are found without -Iincludedir,
  and the library gets linked in without the need for -Llibdir.

* The library is installed in $root, which means that includes are in
  $root/include and the libraries in $root/lib.  OpenSSL's own default
  for $root is /usr/local/ssl, which Wget checks for.

To resolve situations like this, Wget should probably support
specifying additional include and library directories separately.
I believe you can work around this by specifying:

./configure CPPFLAGS=-I/usr/local/contrib/include -L/usr/local/contrib/lib

Can you check if that works for you?

 3) Using the native IBM C compiler (CC=cc) instead GNU gcc I got the
 compile error

   cc -qlanglvl=ansi -I. -I. -I/opt/include   -DHAVE_CONFIG_H
   -DSYSTEM_WGETRC=\/usr/local/contrib/etc/wgetrc\
   -DLOCALEDIR=\/usr/local/contrib/share/locale\ -O -c main.c
   main.c, line 147.16: 1506-275 (S) Unexpected text ',' encountered.

 Simply changing line 147 of src/main.c from

OPT__PARENT,

 to

OPT__PARENT

 let the compile error vanish (sorry, I am not a C expert).
[...]

It's a newer C feature that leaked into Wget -- sorry about that.

I'll try to get these fixed in Wget 1.10.1.

Re: wget 1.10 problems under AIX

2005-06-15 Thread Hrvoje Niksic

This patch should take care of the problems with compiling Wget 1.10
with the native IBM cc.

2005-06-15  Hrvoje Niksic  [EMAIL PROTECTED]

* host.h (ip_address): Remove the trailing comma from the type
enum in the no-IPv6 case.

* main.c (struct cmdline_option): Remove the trailing comma from
the enum.

Reported by Jens Schleusener.

Index: src/host.h
===
RCS file: /pack/anoncvs/wget/src/host.h,v
retrieving revision 1.27
diff -u -r1.27 host.h
--- src/host.h  2005/03/04 19:21:01 1.27
+++ src/host.h  2005/06/15 20:06:53
@@ -49,9 +49,9 @@
 typedef struct {
   /* Address type. */
   enum { 
-IPV4_ADDRESS, 
+IPV4_ADDRESS
 #ifdef ENABLE_IPV6
-IPV6_ADDRESS 
+, IPV6_ADDRESS 
 #endif /* ENABLE_IPV6 */
   } type;
 
Index: src/main.c
===
RCS file: /pack/anoncvs/wget/src/main.c,v
retrieving revision 1.137
diff -u -r1.137 main.c
--- src/main.c  2005/05/06 15:50:50 1.137
+++ src/main.c  2005/06/15 20:06:54
@@ -144,7 +144,7 @@
 OPT__DONT_REMOVE_LISTING,
 OPT__EXECUTE,
 OPT__NO,
-OPT__PARENT,
+OPT__PARENT
   } type;
   const void *data;/* for standard options */
   int argtype; /* for non-standard options */

Re: wget segfault on malformed working directory

2005-06-15 Thread Hrvoje Niksic

Nagy Ferenc Lszl [EMAIL PROTECTED] writes:

 If the ftp server returns invalid data (for example '221 Bye.') in
 response to PWD, wget segfaults because in ftp_pwd (ftp-basic.c)
 request will be NULL after the line 'request = strtok (NULL,
 \);', and this NULL will be passed to xstrdup.

Thanks for the report; this patch should fix the problem:

2005-06-15  Hrvoje Niksic  [EMAIL PROTECTED]

* ftp-basic.c (ftp_pwd): Handle malformed PWD response.

Index: src/ftp-basic.c
===
RCS file: /pack/anoncvs/wget/src/ftp-basic.c,v
retrieving revision 1.47
diff -u -r1.47 ftp-basic.c
--- src/ftp-basic.c 2005/05/16 22:08:57 1.47
+++ src/ftp-basic.c 2005/06/15 20:10:43
@@ -1081,6 +1081,7 @@
 return err;
   if (*respline == '5')
 {
+err:
   xfree (respline);
   return FTPSRVERR;
 }
@@ -1089,6 +1090,10 @@
  and everything following it. */
   strtok (respline, \);
   request = strtok (NULL, \);
+  if (!request)
+/* Treat the malformed response as an error, which the caller has
+   to handle gracefully anyway.  */
+goto err;
 
   /* Has the `pwd' been already allocated?  Free! */
   xfree_null (*pwd);

RE: wget 1.10 released

2005-06-10 Thread Herold Heiko

Windows MSVC binary at http://xoomer.virgilio.it/hherold/
Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 10, 2005 9:12 AM
 To: wget@sunsite.dk; [EMAIL PROTECTED]
 Subject: wget 1.10 released
 
 
 
 hi to everybody,
 
 i have just uploaded the wget 1.10 tarball on ftp.gnu.org:
 
 ftp://ftp.gnu.org/gnu/wget/wget-1.10.tar.gz
 
 you can find the GPG signature of the tarball at these URLs:
 
 ftp://ftp.gnu.org/gnu/wget/wget-1.10.tar.gz.sig
 
 and the GPG key i have used for the signature at this URL:
 
 http://www.tortonesi.com/GNU-GPG-Key.txt
 
 the key fingerprint is:
 
 pub  1024D/7B2FD4B0 2005-06-02 Mauro Tortonesi (GNU Wget Maintainer) 
 [EMAIL PROTECTED]
  Key fingerprint = 1E90 AEA8 D511 58F0 94E5  B106 7220 
 24E9 7B2F D4B0
 
 the MD5 checksum of the tarball (and signature) are:
 
 caddc199d2cb31969e32b19fd365b0c5  wget-1.10.tar.gz
 7dff7d39129051897ab6268b713766bf  wget-1.10.tar.gz.sig
 
 the long-awaited 1.10 release is a significant improvement 
 over the last 1.9.1 
 release, introducing a few important features like long file 
 support and NTLM 
 authentication, lots of improvements (especially in IPv6 and 
 SSL code) and 
 many bugfixes.
 
 last but not least, a brief personal comment. this is my 
 first release as wget 
 maintainer, and i am very excited about it. however i would 
 like to say that, 
 even if he stepped down from the maintainer position, the 
 main author of wget 
 is still hrvoje niksic, who really did an awesome work on 
 wget 1.10. hrvoje 
 is one of the best developers i have ever worked with and i 
 would like to 
 thank him for all the effort he put on the this release of 
 wget, especially 
 since the last few months have been rather difficult for him.
 
 -- 
 Aequam memento rebus in arduis servare mentem...
 
 Mauro Tortonesi  http://www.tortonesi.com
 
 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 Institute for Human  Machine Cognition  http://www.ihmc.us
 GNU Wget - HTTP/FTP file retrieval tool  
http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

RE: wget and ASCII mode

2005-06-06 Thread Kiran Atlluri

Thank you.
I appreciate this.

Will keep you posted on how it turns out.
Regards,
Kiran


-Original Message-
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Saturday, June 04, 2005 8:39 AM
To: WGET@sunsite.dk
Cc: Kiran Atlluri
Subject: Re: wget and ASCII mode

From: Kiran Atlluri

 [...]
 I am trying to retrieve a ?.csv? file on a unix system using wget (ftp
 mode).I 
 
 When I retrieve a file using normal FTP and specify ASCII mode, I
 successfully get the file and there are no ? ^ M ? at the end of line
in
 this file.
 
 But when I use wget all the lines in the file have this ? ^M ? at the
 end.
 [...]

   This happens because write_data() (in src/retr.c) does nothing to
adjust the FTP-standard CR-LF line endings according to the local
standard (in this case, LF-only), which a proper FTP client should do.

   A fix for this was included among my recent (well, not _very_ recent
now) VMS-related patch submissions, but it would probably be a mistake
to hold your breath waiting for those changes to be incorporated into
the main code stream.

   If you're desperate to see what I did to fix this, you could visit:

  http://antinode.org/ftp/wget/patch1/
  ftp://antinode.org/wget/patch1/

A quick search for the (new) enum value rb_ftp_ascii suggests that the
relevant changes are in ftp.c, retr.c, and retr.h.

   Feel free to get in touch if you have any questions about what you
find there.  (The new code does make one potentially risky assumption,
but it's explained in the comments.)



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget 1.10 release candidate 1

2005-06-04 Thread Jochen Roderburg

Zitat von Oliver Schulze L. [EMAIL PROTECTED]:

 Hi Mauro,
 do you know if the regex patch from Tobias was applied to this release?

 Thanks
 Oliver


The last words on this topic that I remember were here:

http://www.mail-archive.com/wget@sunsite.dk/msg07436.html

Regards,
J.Roderburg

Re: wget 1.10 release candidate 1

2005-06-04 Thread Oliver Schulze L.





Thanks Jochen,
I'm downloading both now

Oliver

Jochen Roderburg wrote:

  Zitat von "Oliver Schulze L." [EMAIL PROTECTED]:

  
  
Hi Mauro,
do you know if the regex patch from Tobias was applied to this release?

Thanks
Oliver


  
  
The last words on this topic that I remember were here:

http://www.mail-archive.com/wget@sunsite.dk/msg07436.html

Regards,
J.Roderburg
  


-- 
Oliver Schulze L.
[EMAIL PROTECTED]

Re: wget and ASCII mode

2005-06-04 Thread Steven M. Schweda

From: Kiran Atlluri

 [...]
 I am trying to retrieve a ?.csv? file on a unix system using wget (ftp
 mode).I 
 
 When I retrieve a file using normal FTP and specify ASCII mode, I
 successfully get the file and there are no ? ^ M ? at the end of line in
 this file.
 
 But when I use wget all the lines in the file have this ? ^M ? at the
 end.
 [...]

   This happens because write_data() (in src/retr.c) does nothing to
adjust the FTP-standard CR-LF line endings according to the local
standard (in this case, LF-only), which a proper FTP client should do.

   A fix for this was included among my recent (well, not _very_ recent
now) VMS-related patch submissions, but it would probably be a mistake
to hold your breath waiting for those changes to be incorporated into
the main code stream.

   If you're desperate to see what I did to fix this, you could visit:

  http://antinode.org/ftp/wget/patch1/
  ftp://antinode.org/wget/patch1/

A quick search for the (new) enum value rb_ftp_ascii suggests that the
relevant changes are in ftp.c, retr.c, and retr.h.

   Feel free to get in touch if you have any questions about what you
find there.  (The new code does make one potentially risky assumption,
but it's explained in the comments.)



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget 1.10 release candidate 1

2005-06-04 Thread Oliver Schulze L.





Hi,
Neither, rc1 or alpha2 have prce patch included.
I think that prce is a very usefull patch, and it should be
added to CVS and not enabled by default in the ./configure script.
So, if you want to use prce, just ./configure --with-prce
and everybody is happy.

Just my 2c

Oliver

Jochen Roderburg wrote:

  Zitat von "Oliver Schulze L." [EMAIL PROTECTED]:

  
  
Hi Mauro,
do you know if the regex patch from Tobias was applied to this release?

Thanks
Oliver


  
  
The last words on this topic that I remember were here:

http://www.mail-archive.com/wget@sunsite.dk/msg07436.html

Regards,
J.Roderburg
  


-- 
Oliver Schulze L.
[EMAIL PROTECTED]

Re: wget 1.10 release candidate 1

2005-06-04 Thread Jochen Roderburg

Zitat von Oliver Schulze L. [EMAIL PROTECTED]:

 Neither, rc1 or alpha2 have prce patch included.
 I think that prce is a very usefull patch, and it should be
 added to CVS and not enabled by default in the ./configure script.
 So, if you want to use prce, just ./configure --with-prce
 and everybody is happy.

Hmmm, you mean everybody who has prce is happy?
Did you not read the message that I pointed you to ;-) ??
It said that the developers do not want to include a regex patch in wget until
they find a solution that is portable enough to all systems that wget is
supposed to run on.
And no, I'm not involved in this, just wanted to remind that this has been
discussed already a few times on the list ;-)

J.Roderburg

Re: wget 1.10 release candidate 1

2005-06-04 Thread Oliver Schulze L.





Hi Jochen,
yes, I readed it.
Thats why I suggested using an option to ./configure in order to
enabled it.
And, it should be disabled by default.

Its a nice options for all, because, if you don't have pcre, you won't
receive
any warning and it won't hurt nobody.

HTH
Oliver

Jochen Roderburg wrote:

  Zitat von "Oliver Schulze L." [EMAIL PROTECTED]:

  
  
Neither, rc1 or alpha2 have prce patch included.
I think that prce is a very usefull patch, and it should be
added to CVS and not enabled by default in the ./configure script.
So, if you want to use prce, just ./configure --with-prce
and everybody is happy.

  
  
Hmmm, you mean everybody who has "prce" is happy?
Did you not read the message that I pointed you to ;-) ??
It said that the developers do not want to include a regex patch in wget until
they find a solution that is portable enough to all systems that wget is
supposed to run on.
And no, I'm not involved in this, just wanted to remind that this has been
discussed already a few times on the list ;-)

J.Roderburg
  


-- 
Oliver Schulze L.
[EMAIL PROTECTED]

Re: wget 1.10 release candidate 1

2005-06-03 Thread Oliver Schulze L.


Hi Mauro,
do you know if the regex patch from Tobias was applied to this release?

Thanks
Oliver

Mauro Tortonesi wrote:


dear friends,

i have just released the first release candidate of wget 1.10:

ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.gz
ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.bz2

you are encouraged to download the tarballs, test if the code works properly
and report any bug you find.

if no major bug report will be submitted in the next two days, i am planning 
to release wget 1.10 next thursday.

Re: wget 1.10 release candidate 1

2005-05-31 Thread Steven M. Schweda

 i have just released the first release candidate of wget 1.10:
 
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.gz
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-rc1.tar.bz2
 
 you are encouraged to download the tarballs, test if the code works
 properly and report any bug you find.

   The VMS changes seem to be missing.  But you probably knew that.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget Question/Suggestion

2005-05-20 Thread Hrvoje Niksic

Mark Anderson [EMAIL PROTECTED] writes:

 Is there an option, or could you add one if there isn't, to specify
 that I want wget to write the downloaded html file, or whatever, to
 stdout so I can pipe it into some filters in a script?

Yes, use `-O -'.

Re: wget-1.9.1 Tries to Connect to localhost

2005-05-18 Thread Hrvoje Niksic

Jim Peterson [EMAIL PROTECTED] writes:

   Using Fedora Core 3, when I wget http://www.studylight.org/;, it prints 
 out:

 --02:52:30--  http://www.studylight.org/
= `index.html'
 Resolving www.studylight.org... 63.164.18.58
 Connecting to www.studylight.org[63.164.18.58]:80... connected.
 HTTP request sent, awaiting response... 302 Found
 Location: http://localhost/ [following]
 --02:52:30--  http://localhost/
= `index.html'
 Resolving localhost... 127.0.0.1
 Connecting to localhost[127.0.0.1]:80... failed: Connection refused.

 Why is it trying to connect to localhost?

Because it redirects you to localhost, possibly as a (feeble) attempt
to prevent the site from being leeched with Wget.  Use `wget -U
Mozilla' and the problem goes away.

 My browser can load the page, but if I manually telnet
 www.studylight.org 80 and type GET /, I get a page that tends to
 indicate a peculiar web server setting that returns the Apache test
 page.

That is a symptom of the site using name-based virtual hosting.  You
must remember to also specify the Host header.

$ telnet www.studylight.org 80
Trying 63.164.18.58...
Connected to newadmin.studylight.org.
Escape character is '^]'.
GET / HTTP/1.0
Host: www.studylight.org

HTTP/1.1 200 OK
Date: Wed, 18 May 2005 08:23:12 GMT
Server: Apache/1.3.33 (Unix)  (Gentoo/Linux) mod_perl/1.27
Connection: close
Content-Type: text/html


!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
...

 Is this simply an indication of a poorly administered server, or is
 it a bug in wget?

It's overprotectiveness of the server administrator.  If you do the
same telnet thing, but introduce yourself as Wget, you get the bogus
redirection to localhost:

$ telnet www.studylight.org 80
Trying 63.164.18.58...
Connected to newadmin.studylight.org.
Escape character is '^]'.
GET / HTTP/1.0
Host: www.studylight.org
User-Agent: Wget/1.9.1

HTTP/1.1 302 Found
Date: Wed, 18 May 2005 08:24:10 GMT
Server: Apache/1.3.33 (Unix)  (Gentoo/Linux) mod_perl/1.27
Location: http://localhost/
Connection: close
Content-Type: text/html; charset=iso-8859-1

!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN
HTMLHEAD
TITLE302 Found/TITLE
/HEADBODY
H1Found/H1
The document has moved A HREF=http://localhost/;here/A.P
HR
ADDRESSApache/1.3.33 Server at studylight.org Port 80/ADDRESS
/BODY/HTML

Re: wget-1.9.1 Tries to Connect to localhost

2005-05-17 Thread Mauro Tortonesi

On Tuesday 17 May 2005 01:56 am, Jim Peterson wrote:
   Using Fedora Core 3, when I wget http://www.studylight.org/;, it prints
 out:

 --02:52:30--  http://www.studylight.org/
= `index.html'
 Resolving www.studylight.org... 63.164.18.58
 Connecting to www.studylight.org[63.164.18.58]:80... connected.
 HTTP request sent, awaiting response... 302 Found
 Location: http://localhost/ [following]
 --02:52:30--  http://localhost/
= `index.html'
 Resolving localhost... 127.0.0.1
 Connecting to localhost[127.0.0.1]:80... failed: Connection refused.

   Why is it trying to connect to localhost?  My browser can load the page,
 but if I manually telnet www.studylight.org 80 and type GET /, I get a
 page that tends to indicate a peculiar web server setting that returns the
 Apache test page.  Is this simply an indication of a poorly administered
 server, or is it a bug in wget?

it seems to be a problem with the server:

DEBUG output created by Wget 1.10-beta1+cvs-dev on linux-gnu.

--21:44:29--  http://www.studylight.org/
   = `index.html'
Resolving www.studylight.org... 63.164.18.58
Caching www.studylight.org = 63.164.18.58
Connecting to www.studylight.org|63.164.18.58|:80... connected.
Created socket 4.
Releasing 0x080835f8 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.10-beta1+cvs-dev
Accept: */*
Host: www.studylight.org
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 Found
Date: Wed, 18 May 2005 02:44:29 GMT
Server: Apache/1.3.33 (Unix)  (Gentoo/Linux) mod_perl/1.27
Location: http://localhost/
  ^
Connection: close
Content-Type: text/html; charset=iso-8859-1

---response end---
302 Found
Location: http://localhost/ [following]
Closed fd 4
--21:44:30--  http://localhost/
   = `index.html'
Resolving localhost... 127.0.0.1
Caching localhost = 127.0.0.1
Connecting to localhost|127.0.0.1|:80... Closed fd 4
failed: Connection refused.
Releasing 0x08081370 (new refcount 1).


-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

RE: wget 1.10 beta 1

2005-05-12 Thread Herold Heiko

Windows MSVC6 binary for testing purposes here:
http://xoomer.virgilio.it/hherold/

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 11, 2005 8:41 PM
 To: wget@sunsite.dk
 Subject: wget 1.10 beta 1
 
 
 
 dear friends,
 
 i have just released the first beta version of wget 1.10:
 
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-beta1.tar.gz
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-beta1.tar.bz2
 
 you are encouraged to download the tarballs, test if the code 
 works properly 
 and report any bug you find.
 
 i am still doing tests on this code, but it seems to work 
 fine, so i think 
 we'll be able to release wget 1.10 in 7-10 days.
 
 -- 
 Aequam memento rebus in arduis servare mentem...
 
 Mauro Tortonesi  http://www.tortonesi.com
 
 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 Institute of Human  Machine Cognition   http://www.ihmc.us
 GNU Wget - HTTP/FTP file retrieval tool  
 http://www.gnu.org/software/wget
 Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
 Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget doesn't get all page requisites...

2005-05-11 Thread Hrvoje Niksic

Joerg Ottermann [EMAIL PROTECTED] writes:

 i try to archive some pages using wget, but it seems, that i have some
 problems when TE:chunked is used.

The server must not use Transfer-Encoding: chunked in response to an
HTTP/1.0 request.  Are you sure that is the problem?

Re: wget with ? and in urls

2005-05-05 Thread Hrvoje Niksic

Vitaly Lomov [EMAIL PROTECTED] writes:

 Hello
 I am trying to get a site http://www.cro.ie/index.asp with the following flags
 -r -l2
 or
 -kr -l2
 or
 -Er -l2
 or
 -Ekr -l2
 In all cases, the linked files are saved with '@' instead of '?' in
 the name, but in the index.asp the link still refers to names with '?'

Maybe you're not letting Wget finish the mirroring.  The links are
converted only after everything has been downloaded.  I've now tried
`wget -Ekrl2 http://www.cro.ie/index.asp --restrict-file-names=windows'
(the last argument being to emulate modification of ? to @ done under
Windows) and it converted the links correctly.

The only links not converted were the ones generated in JavaScript,
but only two of those were in index.asp -- the rest were converted
correctly.

Re: wget with ? and in urls

2005-05-05 Thread Hrvoje Niksic

Vitaly Lomov [EMAIL PROTECTED] writes:

 Maybe you're not letting Wget finish the mirroring.  The links are
 converted only after everything has been downloaded.  I've now tried
 `wget -Ekrl2 http://www.cro.ie/index.asp --restrict-file-names=windows'
 (the last argument being to emulate modification of ? to @ done under
 Windows) and it converted the links correctly.
 Actually, it never finishes for me. I have waited for an hour now,
 still waits for response. I don't know how you could do it in 30min.

I now tried it again, and it took about 10 minutes.  My DSL connection
was the bottle-neck.

 I just copied your command line, ran it and it blocks. Then I put
 the timeout -T9 , still waits:

 --17:30:02-- 
 http://www.cro.ie/search/template_generic.asp?ID=8Level1=3Level2=0
  = `www.cro.ie/search/[EMAIL PROTECTED]Level1=3Level2=0'
 Reusing existing connection to www.cro.ie:80.
 HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
 Length: unspecified

 [=  ] 0 --.--K/s

I got this for that part of the download:

--20:54:00--  
http://www.cro.ie/search/template_generic.asp?ID=8Level1=3Level2=0
   = `www.cro.ie/search/[EMAIL PROTECTED]Level1=3Level2=0'
Connecting to www.cro.ie|62.17.220.228|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
20:54:02 ERROR 404: Not Found.


The assuming HTTP/0.9 you see is potentially dangerous because it
might indicate that a previous download left things in a strange
state.

Does the download work if you use --no-http-keep-alive?

 Do you run yours on non-Windows? maybe that's the difference. Will a
 debug printout of this help you?

I haven't tried it on Windows yet because I thought the problem was
related to link conversion and therefore occurred on all platforms.

Re: Wget converts links correctly only for the first time

2005-05-03 Thread Hrvoje Niksic

Andrzej  [EMAIL PROTECTED] writes:

 Will the patches be included in the stable 1.10?

Probably.  1.10 is in feature freeze, but this really is a bug fix.
I'd like to check with others if that change is deemed safe for
mirroring of other sites.

 Clicking on that link redirects to that page:
 https://lists.man.lodz.pl/mailman/listinfo
 and from all the links which are on that page the files are unnecessarily 
 downloaded (I do not want that page and the subpages).

 So how can I block it?

Could you use -X /mailman/listinfo ?

Re: Wget converts links correctly only for the first time

2005-05-03 Thread Andrzej

  Clicking on that link redirects to that page:
  https://lists.man.lodz.pl/mailman/listinfo
  and from all the links which are on that page the files are unnecessarily 
  downloaded (I do not want that page and the subpages).
 
  So how can I block it?
 
 Could you use -X /mailman/listinfo ?

I tried now, and it did not help. 
Still instead of just index.html and index.html.orig and subdirectories 
of the http://lists.man.lodz.pl/pipermail/mineraly/

there are many many other files downloaded from the 
https://lists.man.lodz.pl/mailman/listinfo 
page:

admin.html
admin.html.orig
chemfan.html
chemfan.html.orig
create.html
create.html.orig
gnu-head-tiny.jpg
info
info.1.html
info.1.html.orig
listinfo.html
listinfo.html.orig
lodz-l.html
lodz-l.html.orig
mailman.jpg
mineraly.html
mineraly.html.orig
mineralyftp
mm-icon.png
odlew-pl.html
odlew-pl.html.orig
os2.html
os2.html.orig
pecet.html
pecet.html.orig
pol34-info
pol34-info.1.html
pol34-info.1.html.orig
polip.html
polip.html.orig
PythonPowered.png
test.html
test.html.orig

a.

Re: Wget converts links correctly only for the first time

2005-05-03 Thread Hrvoje Niksic

Andrzej [EMAIL PROTECTED] writes:

  Clicking on that link redirects to that page:
  https://lists.man.lodz.pl/mailman/listinfo
  and from all the links which are on that page the files are unnecessarily 
  downloaded (I do not want that page and the subpages).
 
  So how can I block it?
 
 Could you use -X /mailman/listinfo ?

 I tried now, and it did not help. Still instead of just index.html
 and index.html.orig and subdirectories of the
 http://lists.man.lodz.pl/pipermail/mineraly/

I believe 1.9.1 had a bug in this area when -m (which implies -l0) was
used.  Could you try specifying -l50 along with the other options, and
after -m?

Re: Wget converts links correctly only for the first time

2005-05-03 Thread Andrzej


 I believe 1.9.1 had a bug in this area when -m (which implies -l0) was
 used.  Could you try specifying -l50 along with the other options, and
 after -m?

It still downloaded everything.

a.

Re: Wget converts links correctly only for the first time.

2005-05-02 Thread Andrzej

 Yup. So I assume that the problem you see is not that of wget mirroring, but
 a combination of saving to a custom dir (with --cut-dirs and the like) and
 conversion of the links. Obviously, the link to
 http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html which would be
 correct for a standard wget -m URL was carried over while the custom link
 to http://mineraly.feedle.com/Ftp/UpLoad/index.html was not created.
 My test with wget 1.5 just was a simple wget15 -m -np URL and it worked. 
 So maybe the convert/rename problem/bug was solved with 1.9.1
 This would also explain the missing gif file, I think.

And the above quoted link is also incorrect after second run of wget, it 
is now again:
http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html

:(:::

a.

Re: Wget converts links correctly only for the first time

2005-05-02 Thread Hrvoje Niksic

Andrzej  [EMAIL PROTECTED] writes:

 It's not the end of troubles though! 
 It works correctly *only* for the first time! 
 When I (or cron) run the same mirroring commands again over already 
 mirrored files to renew the mirror, then the correctly converted link of 
 the gif file (on the main mirror web page):
 http://mineraly.feedle.com/Gify/ChemFan.gif
 is exchanged to the incorrect one:
 http://znik.wbc.lublin.pl/Mineraly/Gify/ChemFan.gif

The problem is that Wget is re-converting the files it decided it
didn't want to download due to timestamping.  For example:

1st time:
URL:  http://znik.wbc.lublin.pl/Mineraly/
link: img src=http://znik.wbc.lublin.pl/ChemFan/Gify/ChemFan.gif;

Since the image is downloaded to Gify/ChemFan.gif, this is converted
to:
  img src=Gify/ChemFan.gif

2nd time:
URL:  http://znik.wbc.lublin.pl/Mineraly/  (using local copy of that URL)
link: img src=Gify/ChemFan.gif

Since no such image is downloaded, Wget converts the link back to
absolute one.  Merging http://znik.wbc.lublin.pl/Mineraly/; with
Gify/ChemFan.gif results in the totally bogus
http://znik.wbc.lublin.pl/Mineraly/Gify/ChemFan.gif; that you're
seeing.

That explains the mechanics of the bug, but not what to do about it.
There are two solutions:

1. If an HTML file is re-downloaded because of time-stamping, it
   should not be re-converted because (since the file hasn't changed)
   there is no reason to do so.  I'm trying to think of a scenario
   where this would break things, but I can't come up with any.

2. If --backup-converted is in use (which it is in your case), link
   conversion could read the pristine .orig file and write it to the
   resulting HTML.  This is a bit more complex, but might help if
   solution #1 turns out to break some scenarios.

Here is a patch that implements #1.  (It applies to the CVS source,
but it's easy enough to manually apply it to the source of 1.9.1.)
With that patch the mirror seems correct in the 2nd run.  Please let
me know if it works for you.

Index: src/http.c
===
RCS file: /pack/anoncvs/wget/src/http.c,v
retrieving revision 1.173
diff -u -r1.173 http.c
--- src/http.c  2005/04/28 13:56:31 1.173
+++ src/http.c  2005/05/02 14:58:53
@@ -2318,6 +2318,11 @@
 local_filename);
  free_hstat (hstat);
  xfree_null (dummy);
+ /* The file is the same; assume that the links have
+already been converted.  Otherwise we run the
+risk of converting links twice, which is
+wrong.  */
+ *dt |= DT_DISABLE_CONVERSION;
  return RETROK;
}
  else if (tml = tmr)
Index: src/retr.c
===
RCS file: /pack/anoncvs/wget/src/retr.c,v
retrieving revision 1.95
diff -u -r1.95 retr.c
--- src/retr.c  2005/04/16 20:12:43 1.95
+++ src/retr.c  2005/05/02 14:58:55
@@ -761,7 +761,7 @@
  register_download (u-url, local_file);
  if (redirection_count  0 != strcmp (origurl, u-url))
register_redirection (origurl, u-url);
- if (*dt  TEXTHTML)
+ if ((*dt  TEXTHTML)  !(*dt  DT_DISABLE_CONVERSION))
register_html (u-url, local_file);
}
 }
Index: src/wget.h
===
RCS file: /pack/anoncvs/wget/src/wget.h,v
retrieving revision 1.57
diff -u -r1.57 wget.h
--- src/wget.h  2005/04/27 21:08:40 1.57
+++ src/wget.h  2005/05/02 14:58:55
@@ -233,7 +233,8 @@
   HEAD_ONLY= 0x0004,   /* only send the HEAD request */
   SEND_NOCACHE = 0x0008,   /* send Pragma: no-cache directive */
   ACCEPTRANGES = 0x0010,   /* Accept-ranges header was found */
-  ADDED_HTML_EXTENSION = 0x0020 /* added .html extension due to -E */
+  ADDED_HTML_EXTENSION = 0x0020,   /* added .html extension due to -E */
+  DT_DISABLE_CONVERSION = 0x0040   /* disable link conversion */
 };
 
 /* Universal error type -- used almost everywhere.  Error reporting of

Re: Wget converts links correctly only for the first time

2005-05-02 Thread Andrzej

 With that patch the mirror seems correct in the 2nd run.  Please let
 me know if it works for you.

*After* I deleted the files with the wrong URLs, the patched wget 1.9.1 
retrieved the files correctly, and after second run did not change the 
URLs for the wrong ones. So it worked on the pg.gda.pl.

On the feedle.com I downloaded, patched and installed ver. 1.10alpha2 of 
wget. 
Double mirroring worked here, too.

Thanks again for the patches.

Will the patches be included in the stable 1.10?

I have one more little problem:

On that source page:
http://lists.man.lodz.pl/pipermail/mineraly/
there is a link at the bottom:
https://lists.man.lodz.pl/
Clicking on that link redirects to that page:
https://lists.man.lodz.pl/mailman/listinfo
and from all the links which are on that page the files are unnecessarily 
downloaded (I do not want that page and the subpages).

So how can I block it?

Is the -R option used only for extensions or also for filenames?
Should I use -G option?
However, I want to download everything (exept the last link) from that 
page:
http://lists.man.lodz.pl/pipermail/mineraly/
so I cannot block all the domain http://lists.man.lodz.pl/
but clicking on that link redirects to:
https://lists.man.lodz.pl/mailman/listinfo
so would the -G or -R work in such situation?

a.

RE: wget 1.10 alpha 3

2005-04-28 Thread Herold Heiko

Windows (MSVC) test binary available at http://xoomer.virgilio.it/hherold/

Notes:

windows/wget.dep needs an attached patch (change gen_sslfunc to openssl.c,
change gen_sslfunc.h to ssl.h).
src/Makefile.in doesn't contain dependencies for http-ntlm$o
(windows/wget.dep either).
INSTALL should possibly mention the --disable-ntlm configure option.
I still advocate a warning (placed in windows/Readme or configure.bat) for
old msvc compilers, like in the attached patch.

Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 28, 2005 8:56 AM
 To: wget@sunsite.dk; [EMAIL PROTECTED]
 Subject: wget 1.10 alpha 3
 
 
 
 dear friends,
 
 i have just released the third alpha version of wget 1.10:
 
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha3.tar.gz
 ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha3.tar.bz2
 
 as always, you are encouraged to download the tarballs, test 
 if the code works 
 properly and report any bug you find.
 
 
 -- 
 Aequam memento rebus in arduis servare mentem...
 
 Mauro Tortonesi  http://www.tortonesi.com
 
 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 Institute of Human  Machine Cognition   http://www.ihmc.us
 GNU Wget - HTTP/FTP file retrieval tool  
 http://www.gnu.org/software/wget
 Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
 Ferrara Linux User Group http://www.ferrara.linux.it
 



20050428.wget-dep.diff
Description: Binary data


20050420.winreadme.diff
Description: Binary data

Re: wget 1.10 alpha 3

2005-04-28 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 windows/wget.dep needs an attached patch (change gen_sslfunc to openssl.c,
 change gen_sslfunc.h to ssl.h).

Applied, thanks.

 src/Makefile.in doesn't contain dependencies for http-ntlm$o
 (windows/wget.dep either).

I don't have the dependency-generating script handy anymore.  However,
the dependency to the corresponding C file is automatic, and it's a
good idea to `make clean' when you change a header file anyway.

 INSTALL should possibly mention the --disable-ntlm configure option.

Done.

 I still advocate a warning (placed in windows/Readme or
 configure.bat) for old msvc compilers, like in the attached patch.

Applied now.

RE: wget 1.10 alpha 3

2005-04-28 Thread Yaroslav Shchelkunov

Cannot compile if ./configure --without-ssl :

===cut on===
gcc -I. -I.   -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
-DLOCALE
DIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c init.c
init.c:214: structure has no member named `random_file'
init.c:214: initializer element is not constant
init.c:214: (near initialization for `commands[78].place')
*** Error code 1

Stop in /usr/home/yar/src/wget-1.10-alpha3/src.
*** Error code 1

Stop in /usr/home/yar/src/wget-1.10-alpha3.
===cut off===

FreeBSD 4.11-RELEASE.

Re: wget 1.10 alpha 3

2005-04-28 Thread Hrvoje Niksic

Thanks for the report; this problem is fixed in CVS.  The workaround
is to wrap the appropriate init.c line in #ifdef HAVE_SSL.

Re: Wget not resending cookies on Location: in headers

2005-04-26 Thread Hrvoje Niksic

[EMAIL PROTECTED] writes:

 Is there a publically accessible site that exhibits this problem?

 I've set up a small example which illustrates the problem. Files can
 be found at http://dev.mesca.net/wget/ (using demo:test as login).

Thanks for setting up this test case.  It has uncovered at least two
bug in the cookie code.

 $ wget --http-user=demo --http-passwd=test --cookies=on
 --save-cookies=cookie.txt http://dev.mesca.net/wget/setcookie.php

The obvious problem is that this command lacks --keep-session-cookies,
and the cookie it gets is session-based.  But there are other problems
as well: if you examine the cookie.txt produced by (the amended
version of) the first command, you'll notice that the cookie's path is
wget/setcookie.php.  For one, the setcookie.php part should have
been stripped (Mozilla does this, I've just checked).  Second, the
path should always begin with a slash.  Either of these problems would
guarantee that no other URL would ever match this cookie.

I've now fixed both bugs in the CVS, along with a third, unrelated
bug.  Please let me know if the latest CVS works for you.  (It works
for me on the example you set up.)

Several notes on usage: --cookies is the default, so you don't need
--cookies=on to send and receive them.  Second, it's somewhat shorter
to specify the user name and password in the URL.  Finally, don't
forget --keep-session-cookies when saving the cookies.

Re: Wget not resending cookies on Location: in headers

2005-04-26 Thread wget

The obvious problem is that this command lacks --keep-session-cookies,
and the cookie it gets is session-based.
I tried to reproduce the bug in the more generic way.
But there are other problems
as well: if you examine the cookie.txt produced by (the amended
version of) the first command, you'll notice that the cookie's path is
wget/setcookie.php.  For one, the setcookie.php part should have
been stripped (Mozilla does this, I've just checked).  Second, the
path should always begin with a slash.  Either of these problems would
guarantee that no other URL would ever match this cookie.
I've now fixed both bugs in the CVS, along with a third, unrelated
bug.  Please let me know if the latest CVS works for you.  (It works
for me on the example you set up.)
Thanks a lot for your corrections. It's now working like a charm. It's 
also working with session cookies.

Regards,
Pierre

Re: Wget Bug

2005-04-26 Thread Hrvoje Niksic

Arndt Humpert [EMAIL PROTECTED] writes:

 wget, win32 rel. crashes with huge files.

Thanks for the report.  This problem has been fixed in the latest
version, available at http://xoomer.virgilio.it/hherold/ .

Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Andrzej

  Thus it seems that it should not matter what is the sequence of the 
  options. If it does I suggest that the developers of wget place 
  appriopriate info in the manual.
 
 Yes, you right. Anyway I found out often that it's sometimes quite tricky
 setting up your command line to get exactly what you want.
 The way I do it always works fine for me.

Could developers confirm whether sequence of options matters or not?

  The log shows, that you haven't downloaded all the graphics from the main 
  page, and also you haven't downloaded that link:
  http://lists.feedle.net/pipermail/minerals/
 
 Well, I didn't verify it with the homepage itself. I initially tried without
 -e --robots=off and got a message blocking further downloading.
 
 With this option I could achieve further access for downloading.
 I have only tried the one link from above.

I doubt. I tried it without the option and I did not have all the 
graphics. -p option doesn't work as it should.

  I could try to use the -D option, but then probably everything would be 
  downloaded from the lists.feedle.net despite the -np option used, 
  wouldn't it? 
 
 I don't know exactly how these two options interact with each other.
 Ever tried the -m option?

Of course I tried, haven't you noticed in my previous posts?

 Very often when mirroring I use this line:
 
 wget -P work:1/ -r -l 2 -H -nc p http://www.xxx.xx;

This is not really proper mirroring, merely downloading.

 This would have the side effect downloading other links recursively and from
 other hosts if there are any.

You see...

 But of course you can define a list of allowed dirs and excluded dirs.
 I never tried this though.

What's the point of mirroring if I would have to define every time 
allowed and excluded directories? 
I want to run mirror automaticly, periodicaly from cron, and therefore 
the options should be as general as possible, so that no matter what 
changes are done on the site I would still have the site properly 
mirrored without amending the options all the time.
But of course some definitions of directories and sites might be 
necessary from time to time, but as I shown in my corespondence here it 
is not possible to define everything that way that mirroring would work 
properly for all the web elements and the web pages on a particular site.

 After all you maybe shouldn't forget the -k option so you can browse these
 sites offline.

I use it.

My conclusion is (and I am really sorry to say that, cause I liked wget 
until now): 
Wget sucks (for mirroring at least)!

It is useful only for very simple tasks, but when one wants to use it for 
sites mirroring it is almost useless, it cannot be done fully properly 
with Wget, as it can be seen in my previous e-mails.

Summary:
1. -p option doesn't do what is should be doing. It doesn't download all 
graphics no matter what is source of the graphics.
2. -P option used with converting links options doesn't allow the links 
to be properly converted (at least in the current stable wget)
2. -D and -I options do not include paths (directories) in URLs. 
3. -np option should IMHO react to the paths after -D and -I options
4. Just everything should be done to enable proper mirroring of the web 
sites.

Multitude options in Wget is just an ilusion. In real life Wget cannot 
cope with sites mirroring. It is not possible in Wget to set options that 
way that sites with some foreign elements (graphics) or web pages 
scattered over several servers (links to different domains) are mirrored 
correctly. And even if the site would not have the above problems then 
still the problem with proper convertion of the links exist.

Does anyone know any software for linux/unix shell, which would cope to 
the task of proper mirroring?

a.

Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Hrvoje Niksic

Andrzej  [EMAIL PROTECTED] writes:

  Thus it seems that it should not matter what is the sequence of the 
  options. If it does I suggest that the developers of wget place 
  appriopriate info in the manual.
 
 Yes, you right. Anyway I found out often that it's sometimes quite tricky
 setting up your command line to get exactly what you want.
 The way I do it always works fine for me.

 Could developers confirm whether sequence of options matters or not?

The order of options does not matter.

Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Hrvoje Niksic

Andrzej  [EMAIL PROTECTED] writes:

 Multitude options in Wget is just an ilusion. In real life Wget
 cannot cope with sites mirroring.

I agree with your criticism, if not with your tone.  We are working on
improving Wget, and I believe that the problems you have seen will be
fixed in the versions to come.  (I plan to look into some of them for
the 1.11 release.)

 And even if the site would not have the above problems then still
 the problem with proper convertion of the links exist.

That problem has been corrected, and it can be worked around by not
using -P.

Re: Wget sucks! [was: Re: wget in a loop?]

2005-04-25 Thread Andrzej

 I agree with your criticism, if not with your tone.  We are working on
 improving Wget, and I believe that the problems you have seen will be
 fixed in the versions to come.  (I plan to look into some of them for
 the 1.11 release.)

OK. Thanks. Good to hear that. Looking forward impatiently for the new 
version. :)

 That problem has been corrected, and it can be worked around by not
 using -P.

Yes, indeed. Thanks.

In order to download all that website:
http://znik.wbc.lublin.pl/ChemFan/
which, unfortunately, partly is also under this address: 
http://lists.man.lodz.pl/pipermail/chemfan/
I had to manually modify content of that web page:
http://znik.wbc.lublin.pl/ChemFan/Archiwum/index.html
(which contains the above link) and use the tricks to make a mirror of 
it all:



cd $HOME/web/chemfan.pl

wget -m -nv -k -K -E -nH --cut-dirs=1 -np -t 1000 -D wbc.lublin.pl -o 
$HOME/logiwget/logchemfan.pl -p http://znik.wbc.lublin.pl/ChemFan/  \

cd $HOME/web/chemfan.pl/arch  \

wget -m -nv -k -K -E -nH -np --cut-dirs=2 -t 1000 -D lists.man.lodz.pl --
follow-ftp -o $HOME/logiwget/logchemfanarchive.pl -p 
http://lists.man.lodz.pl/pipermail/chemfan/  \

cp $HOME/web/domirrora/Archiwum/index.html 
$HOME/web/chemfan.pl/Archiwum/index.html

==

If you know how to make it simpler let me know.

Do you think that  is really necessary here?

Of course for other sites other recipes might need to be developed in 
order to mirror them correctly, so unfortunately it is not universal at 
all.

And unfortunately not everything went fine, yet, when using the above 
script.

On the page 
http://lists.man.lodz.pl/pipermail/chemfan/ 
is a link:
ftp://ftp.man.lodz.pl/pub/doc/LISTY-DYSKUSYJNE/CHEMFAN
and it seems that also to mirror this I'll have to run yet another wget 
session, and then manually modify and copy the page: 
http://chemfan.pl.feedle.com/arch/index.html

a.

Re: Wget not resending cookies on Location: in headers

2005-04-25 Thread wget

Is there a publically accessible site that exhibits this problem?
I've set up a small example which illustrates the problem. Files can be 
found at http://dev.mesca.net/wget/ (using demo:test as login).

Three files:
setcookie.php:
--
? setcookie(wget,I love it!); ?
getcookie.php:
--
? header('Location: getcookie-redirect.php'); ?
get-cookie-redirect.php:

?
if(isset($_COOKIE['wget'])){
echo Ok, I can read the cookie: [wget] .$_COOKIE['wget'];
}else{
echo Cookie is not set.;
}
?
We first set the cookie by wgetting setcookie.php.
Then, we're trying to read the cookie by querying getcookie.php, which 
redirects to get-cookie-redirect.php: wget can't read it.

$ wget --http-user=demo --http-passwd=test --cookies=on 
--save-cookies=cookie.txt http://dev.mesca.net/wget/setcookie.php
$ wget --http-user=demo --http-passwd=test --cookies=on 
--load-cookies=cookie.txt http://dev.mesca.net/wget/getcookie.php

Note: tests were made using the latest version from the CVS 
(1.10-alpha2+cvs-dev).

Le 26 avr. 05, à 00:09, Hrvoje Niksic a écrit :
- The server responds with a Location: http://host.com/member.php; in
headers. Here is the point : member.php requires cookies defined by
index.php and checkuser.php. However these cookies are not resended by
Wget.
That sounds like a bug.  Wget is supposed to resend the cookies.
Could you provide any kind of debug information?  The contents of the
cookies is not important, but the path parameter and the expiry date
is.
According to my tests, the problem is still reproducible whatever 
Path and Expiry date contain.

Regards,
Pierre

Re: wget in a loop?

2005-04-24 Thread Andrzej

Thanks Patrick for a reply,

 AFAIKS your command line is somehow complete mixed up.
 Usually I call wget and first give it the path where to it should save all
 files followed by more options and at last the url from where to get them
 (usually in quotation marks to be sure).

According to man wget:

=
SYNOPSIS
   wget [option]... [URL]...
=

Thus it seems that it should not matter what is the sequence of the 
options. If it does I suggest that the developers of wget place 
appriopriate info in the manual.

 wget -P ram:chemfan/minerals/ -m -o ram:logminerals -nv -e --robots=off
 -k -K -E -nH -np -t 1000 -p http://minerals.feedle.com/
 
 The log file is attached as proof.

The log shows, that you haven't downloaded all the graphics from the main 
page, and also you haven't downloaded that link:
http://lists.feedle.net/pipermail/minerals/

I want to mirror everything including all graphics from that page:
http://minerals.feedle.com/
and including recursively this link:
http://lists.feedle.net/pipermail/minerals/
and this
http://minerals.feedle.com/logo.html (this one is no problem)

but not those links:
http://lists.feedle.net/mailman/listinfo/minerals
http://www.man.lodz.pl/MINERALY/
which should remain in the mirror copies as they are.

I could try to use the -D option, but then probably everything would be 
downloaded from the lists.feedle.net despite the -np option used, 
wouldn't it? 

a.

Re: wget 1.10 alpha 2

2005-04-21 Thread Doug Kaufman

On Wed, 20 Apr 2005, Hrvoje Niksic wrote:

 Herold Heiko [EMAIL PROTECTED] writes:
 
 I am greatly surprised.  Do you really believe that Windows users
 outside an academic environment are proficient in using the compiler?
 I have never seen a home Windows installation that even contained a
 compiler, the only exception being ones that belonged to professional
 C or C++ developers.

This is what Cygwin is all about. Once you open up the Cygwin bash
shell, all you have to do with most source code is configure; make;
make install. I am not a programmer and have been compiling programs
for several years. As long as the program copiles cleanly, there
shouldn't be a problem under Windows. I don't have any idea of how many
Windows users would try to patch the code if it didn't compile out of
the box.
  
 The very idea that a Windows user might grab source code and compile a
 package is strange.  I don't remember ever seeing a Windows program
 distributed in source form.

See, for example, htmldoc which converts html into a pdf file. The
free version is only distributed as source code. Or see consoletelnet,
distributed both as source and binary.
Doug

-- 
Doug Kaufman
Internet: [EMAIL PROTECTED]

Re: wget 1.10 alpha 2

2005-04-21 Thread Hrvoje Niksic

Doug Kaufman [EMAIL PROTECTED] writes:

 On Wed, 20 Apr 2005, Hrvoje Niksic wrote:

 Herold Heiko [EMAIL PROTECTED] writes:
 
 I am greatly surprised.  Do you really believe that Windows users
 outside an academic environment are proficient in using the compiler?
 I have never seen a home Windows installation that even contained a
 compiler, the only exception being ones that belonged to professional
 C or C++ developers.

 This is what Cygwin is all about. Once you open up the Cygwin bash
 shell, all you have to do with most source code is configure; make;
 make install.

Oh, I know that and I *love* Cygwin and use it all the time (while in
Windows)!  But that is beside the point because this problem doesn't
occur under Cygwin in the first place -- Cygwin compilation is as
clean as it gets.

My point was that a typical Windows (not Cygwin) user doesn't know
about the compilation process, nor can he be bothered to learn.
That's a great shame, but it's something that's not likely to change.
Making the code uglier for the sake of ordinary Windows users willing
to compile it brings literally no gain.

The above shouldn't be construed as not wanting to support Windows at
all.  There are Windows users, on this list and elsewhere, who are
perfectly able and willing to compile Wget from source.  But those
users are also able to read the documentation, to turn off
optimization for offending functions, not to mention to upgrade their
compiler, or get a free one that is much less buggy (the Borland
compiler comes to mind, but there are also Mingw, Cygwin, Watcom,
etc.)

Re: wget 1.10 alpha 2

2005-04-20 Thread Hrvoje Niksic

Mauro Tortonesi [EMAIL PROTECTED] writes:

 i totally agree with hrvoje here. in the worst case, we can add an
 entry in the FAQ explaining how to compile wget with those buggy
 versions of microsoft cc.

Umm.  What FAQ?  :-)

RE: wget 1.10 alpha 2

2005-04-20 Thread Herold Heiko

(sorry for the late answer, three days of 16+ hours/day migration aren't
fun, UPS battery exploding inside the UPS almost in my face even less)


 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]

 Herold Heiko [EMAIL PROTECTED] writes:
 
  do have a compiler but aren't really developers (yet) (for example
  first year CS students with old lab computer compilers).
 
 From my impressions of the Windows world, non-developers won't touch
 source code anyway -- they will simply use the binary.

I feel I must dissent. Even today I'm not exactly a developer, I certainly
wasn't when I first placed my greedy hands on wget sources (in order to add
a couple of chars to URL_UNSAFE... back in 98 i think). I just knew where I
could use a compiler and followed instructions.
I'd just like wget still being compilable in an old setup by (growing)
newbies, for the learning value. Maybe something like a small note in the
windows/Readme instructions would be ok, as by the enclosed patch ?

 The really important thing is to make sure that the source works for
 the person likely to create the binaries, in this case you.  Ideally
 he should have access to the latest compiler, so we don't have to
 cater to brokenness of obsolete compiler versions.  This is not about

I must confess I'm torn between the two options. Your point is very valid,
on the other hand while it is still possible I'd like to continue using an
old setup exactly because there are still plenty of those around and I'd
like to catch these problems. Unfortunately I don't have the time to test
everything on two setups, so I think I'll continue with the old one till
easily feasable.

 Also note that there is a technical problem with your patch (if my
 reading of it is correct): it unconditionally turns on debugging,
 disregarding the command-line options.  Is it possible to save the old
 optimization options, turn off debugging, and restore the old options?
 (Borland C seems to support some sort of #pragma push to achieve
 that effect.)

It seems not, msdn mentions push only for #pragma warning, not for
#pragma optimize :(

   optimization, or with a lesser optimization level.  Ideally this
   would be done by configure.bat if it detects the broken compiler
   version.

I tried but didn't find a portably (w9x-w2x) way to do that, since in w9x we
can't redirect easily the standard error used by cl.exe.
Possibly this could be worked around by running the test from a simple perl
script, on the other hand today perl is required (on released packages) only
in order to build the documentation, not for the binary, adding another
dependency would be a pity.

 You mean that you cannot use later versions of C++ to produce
 Win95/Win98/NT4 binaries?  I'd be very surprised if that were the

Absolutely not, what I meant is, later versions can't be installed on older
windows operating systems. I think Visual Studio 6 is the last MS compiler
which runs on even NT4.

  Personally I feel wget should try to still support that not-so-old
  compiler platform if possible,
 
 Sure, but in this case some of the burden falls on the user of the
 obsolete platform: he has to turn off optimization to avoid a bug in
 his compiler.  That is not entirely unacceptable.

I concur, after all if a note is dropped in the windows/Readme either they
will read it, or they will stall due to OpenSSL dependencies (on by default)
anyway.

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax



20050420.winreadme.diff
Description: Binary data

Re: wget 1.10 alpha 2

2005-04-20 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 From my impressions of the Windows world, non-developers won't touch
 source code anyway -- they will simply use the binary.

 I feel I must dissent.

I am greatly surprised.  Do you really believe that Windows users
outside an academic environment are proficient in using the compiler?
I have never seen a home Windows installation that even contained a
compiler, the only exception being ones that belonged to professional
C or C++ developers.

The very idea that a Windows user might grab source code and compile a
package is strange.  I don't remember ever seeing a Windows program
distributed in source form.

 Even today I'm not exactly a developer, I certainly wasn't when I
 first placed my greedy hands on wget sources (in order to add a
 couple of chars to URL_UNSAFE... back in 98 i think). I just knew
 where I could use a compiler and followed instructions.  I'd just
 like wget still being compilable in an old setup by (growing)
 newbies, for the learning value. Maybe something like a small note
 in the windows/Readme instructions would be ok, as by the enclosed
 patch ?

That would be fine with me.

Re: wget 1.10 alpha 2

2005-04-20 Thread Mauro Tortonesi

On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
  i totally agree with hrvoje here. in the worst case, we can add an
  entry in the FAQ explaining how to compile wget with those buggy
  versions of microsoft cc.

 Umm.  What FAQ?  :-)

the official FAQ:

http://www.gnu.org/software/wget/faq.html

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 alpha 2

2005-04-20 Thread Mauro Tortonesi

On Wednesday 20 April 2005 05:55 am, Herold Heiko wrote:
 (sorry for the late answer, three days of 16+ hours/day migration aren't
 fun, UPS battery exploding inside the UPS almost in my face even less)

  -Original Message-
  From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 
  Herold Heiko [EMAIL PROTECTED] writes:
   do have a compiler but aren't really developers (yet) (for example
   first year CS students with old lab computer compilers).
 
  From my impressions of the Windows world, non-developers won't touch
  source code anyway -- they will simply use the binary.

 I feel I must dissent. Even today I'm not exactly a developer, I certainly
 wasn't when I first placed my greedy hands on wget sources (in order to add
 a couple of chars to URL_UNSAFE... back in 98 i think). I just knew where I
 could use a compiler and followed instructions.
 I'd just like wget still being compilable in an old setup by (growing)
 newbies, for the learning value. Maybe something like a small note in the
 windows/Readme instructions would be ok, as by the enclosed patch ?

publishing a separate patch on the website and including it in the tarball 
along with a note in windows/Readme is ok for me. but including an ugly 
workaround in the main sources just to support some older versions of 
microsoft c is definitely not.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.9.1 -- 2 GB limit -- negative filesize

2005-04-20 Thread Mauro Tortonesi


hi alexander,

this is a known problem which is already fixed in cvs. perhaps you may want to 
try using wget 1.10-alpha2:

ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.gz
ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.bz2

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 alpha 2

2005-04-20 Thread Hrvoje Niksic

Mauro Tortonesi [EMAIL PROTECTED] writes:

 On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
  i totally agree with hrvoje here. in the worst case, we can add an
  entry in the FAQ explaining how to compile wget with those buggy
  versions of microsoft cc.

 Umm.  What FAQ?  :-)

 the official FAQ:

 http://www.gnu.org/software/wget/faq.html

This is the first time that I see it.  It's actually pretty good, I
like it.

Re: wget 1.10 alpha 2

2005-04-20 Thread Mauro Tortonesi

On Wednesday 20 April 2005 02:42 pm, Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
  On Wednesday 20 April 2005 04:58 am, Hrvoje Niksic wrote:
  Mauro Tortonesi [EMAIL PROTECTED] writes:
   i totally agree with hrvoje here. in the worst case, we can add an
   entry in the FAQ explaining how to compile wget with those buggy
   versions of microsoft cc.
 
  Umm.  What FAQ?  :-)
 
  the official FAQ:
 
  http://www.gnu.org/software/wget/faq.html

 This is the first time that I see it.  It's actually pretty good, I
 like it.

yes, i like it very much too. it will need an update after the release of 
1.10, though.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 alpha 2

2005-04-19 Thread Mauro Tortonesi

On Friday 15 April 2005 07:24 am, Hrvoje Niksic wrote:
 Herold Heiko [EMAIL PROTECTED] writes:
  However there are still lots of people using Windows NT 4 or even
  win95/win98, with old compilers, where the compilation won't work
  without the patch.  Even if we place a comment in the source file or
  the windows/Readme many of those will be discouraged, say those who
  do have a compiler but aren't really developers (yet) (for example
  first year CS students with old lab computer compilers).
 
 From my impressions of the Windows world, non-developers won't touch
 source code anyway -- they will simply use the binary.

 The really important thing is to make sure that the source works for
 the person likely to create the binaries, in this case you.  Ideally
 he should have access to the latest compiler, so we don't have to
 cater to brokenness of obsolete compiler versions.  This is not about
 Microsoft bashing, either: at at least one point Wget triggered a GCC
 bug; I never installed the (ugly) workaround because later versions of
 GCC fixed the bug.

 Also note that there is a technical problem with your patch (if my
 reading of it is correct): it unconditionally turns on debugging,
 disregarding the command-line options.  Is it possible to save the old
 optimization options, turn off debugging, and restore the old options?
 (Borland C seems to support some sort of #pragma push to achieve
 that effect.)

 There are other possibilities, too:

 * Change the Makefile to compile the offending files without
   optimization, or with a lesser optimization level.  Ideally this
   would be done by configure.bat if it detects the broken compiler
   version.

 * Change the Makefile to simply not use optimization by default.  This
   is suboptimal, but would not be a big problem for Wget in practice
   -- the person creating the binaries would use optimization in his
   build, which means most people would still have access to an
   optimized Wget.

i don't really like these two options and i don't think they're necessary when 
there is a freely downloadable microsoft compiler which works perfectly for 
us.

  Not yet, but I will certainly.  Nevertheless, I think the point is
  the continue to support existing installation if possble issue,
  after all VC6 is not free either, and at least one newer commercial
  VC version has been reported to compile without problems. Those,
  however, certainly don't support Win95, probably don't Win98/ME
  or/and NT4 either (didn't yet check though).

 You mean that you cannot use later versions of C++ to produce
 Win95/Win98/NT4 binaries?  I'd be very surprised if that were the
 case!

yes, this would be very weird.

  Personally I feel wget should try to still support that not-so-old
  compiler platform if possible,

 Sure, but in this case some of the burden falls on the user of the
 obsolete platform: he has to turn off optimization to avoid a bug in
 his compiler.  That is not entirely unacceptable.

i totally agree with hrvoje here. in the worst case, we can add an entry in 
the FAQ explaining how to compile wget with those buggy versions of microsoft 
cc.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Jörn Nettingsmeier

hi wgetters !
a while ago, i wrote:
[1]
wget spans hosts when it shouldn't:
it looks like this behaviour is by design, but it should be documented.
[2]
wget seems to choke on directories that start with a dot. i guess it 
thinks they are references to external pages and does not download 
links containing such directory names.
it turned out that the site in question is excluding robots, so wget 
behaves correctly.
sorry for the false bug report and for overlooking the obvious :)

[3]
wget does not parse css stylesheets and consequently does not retrieve 
url() references, which leads to missing background graphics on some 
sites.
this feature request has not been commented on yet. do think it might be 
useful ?

best regards,
jörn
--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Hrvoje Niksic

Jörn Nettingsmeier [EMAIL PROTECTED] writes:

 [3]

 wget does not parse css stylesheets and consequently does not
 retrieve url() references, which leads to missing background
 graphics on some sites.

 this feature request has not been commented on yet. do think it
 might be useful ?

I think it's very useful, but so far no one has volunteered to work on
it.

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Jörn Nettingsmeier

Hrvoje Niksic wrote:
Jörn Nettingsmeier [EMAIL PROTECTED] writes:

[3]
wget does not parse css stylesheets and consequently does not
retrieve url() references, which leads to missing background
graphics on some sites.
this feature request has not been commented on yet. do think it
might be useful ?

I think it's very useful, but so far no one has volunteered to work on
it.
maybe a student in our project is interested to implement it, if not, 
i'll look into it next week.


--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Hrvoje Niksic

Jörn Nettingsmeier [EMAIL PROTECTED] writes:

wget does not parse css stylesheets and consequently does not
retrieve url() references, which leads to missing background
graphics on some sites.

this feature request has not been commented on yet. do think it
might be useful ?
 I think it's very useful, but so far no one has volunteered to work
 on it.

 maybe a student in our project is interested to implement it, if
 not, i'll look into it next week.

It shouldn't be too hard.  You would need to implement a CSS parser,
and a corresponding get_urls_css function that extracted the URLs from
the CSS source.  (I believe both would be much much simpler than the
corresponding HTML counterparts.)

Finally modify the code in recur.c to call get_urls_css for CSS files,
the same way it calls get_urls_html for HTML's.  convert_links might
need additional work for CSS, but it should also be straightforward.

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Jörn Nettingsmeier

Hrvoje Niksic wrote:
Jörn Nettingsmeier [EMAIL PROTECTED] writes:

wget does not parse css stylesheets and consequently does not
retrieve url() references, which leads to missing background
graphics on some sites.
this feature request has not been commented on yet. do think it
might be useful ?
I think it's very useful, but so far no one has volunteered to work
on it.
maybe a student in our project is interested to implement it, if
not, i'll look into it next week.

It shouldn't be too hard.  You would need to implement a CSS parser,
and a corresponding get_urls_css function that extracted the URLs from
the CSS source.  (I believe both would be much much simpler than the
corresponding HTML counterparts.)
Finally modify the code in recur.c to call get_urls_css for CSS files,
the same way it calls get_urls_html for HTML's.  convert_links might
need additional work for CSS, but it should also be straightforward.
the same parser code might also work for urls in javascript. as it is 
now, mouse-over effects with overlay images don't work, because the 
second file is not retrieved. if we can come up with a good heuristics 
to guess urls, it should work in both cases.


--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: [EMAIL PROTECTED], Telefon: 0203/379-2736

Re: wget spans hosts when it shouldn't and fails to retrieve dirs starting with a dot...

2005-04-18 Thread Hrvoje Niksic

Jörn Nettingsmeier [EMAIL PROTECTED] writes:

 the same parser code might also work for urls in javascript. as it
 is now, mouse-over effects with overlay images don't work, because
 the second file is not retrieved. if we can come up with a good
 heuristics to guess urls, it should work in both cases.

I'm not sure that a CSS parser would really be useful for JavaScript.
Supporting JavaScript URLs in HTML and elsewhere would require some
more heuristics which is IMHO orthogonal to CSS support.

RE: wget 1.10 alpha 2

2005-04-15 Thread Herold Heiko

 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]

 the patch you've posted is really such an ugly workaround 
 (shame on microsoft 

Exactly the same opinion here.
Please don't misunderstand me, personally for most of my work on windows I
use cygnus (including wget) anyway.
However there are still lots of people using Windows NT 4 or even
win95/win98, with old compilers, where the compilation won't work without
the patch.
Even if we place a comment in the source file or the windows/Readme many of
those will be discouraged, say those who do have a compiler but aren't
really developers (yet) (for example first year CS students with old lab
computer compilers).

I suppose we could leave that stuff present but commented out, and print a
warning when configure.bat --msvc is called.
Maybe we could even make that warning conditionally (run cl.exe, use the
dos/windows find.exe in order to check the output, if 12.00 echo warning)
but that would be even more hacky.


 have you tried the microsoft visual c++ toolkit 2003? maybe 
 it works. you can 
 download it for free at the following URL:
 
 http://msdn.microsoft.com/visualc/vctoolkit2003/

Not yet, but I will certainly.
Nevertheless, I think the point is the continue to support existing
installation if possble issue, after all VC6 is not free either, and at
least one newer commercial VC version has been reported to compile without
problems. Those, however, certainly don't support Win95, probably don't
Win98/ME or/and NT4 either (didn't yet check though).

Personally I feel wget should try to still support that not-so-old compiler
platform if possible, even if there are other options, either the direct
successor (current VC) or not (free alternatives like cygnus, mingw and
borland compilers), in order to keep the development process easily
accessible to old installations, in order to have more choices for
everybody.

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

Re: wget 1.10 alpha 2

2005-04-15 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 However there are still lots of people using Windows NT 4 or even
 win95/win98, with old compilers, where the compilation won't work
 without the patch.  Even if we place a comment in the source file or
 the windows/Readme many of those will be discouraged, say those who
 do have a compiler but aren't really developers (yet) (for example
 first year CS students with old lab computer compilers).

From my impressions of the Windows world, non-developers won't touch
source code anyway -- they will simply use the binary.

The really important thing is to make sure that the source works for
the person likely to create the binaries, in this case you.  Ideally
he should have access to the latest compiler, so we don't have to
cater to brokenness of obsolete compiler versions.  This is not about
Microsoft bashing, either: at at least one point Wget triggered a GCC
bug; I never installed the (ugly) workaround because later versions of
GCC fixed the bug.

Also note that there is a technical problem with your patch (if my
reading of it is correct): it unconditionally turns on debugging,
disregarding the command-line options.  Is it possible to save the old
optimization options, turn off debugging, and restore the old options?
(Borland C seems to support some sort of #pragma push to achieve
that effect.)

There are other possibilities, too:

* Change the Makefile to compile the offending files without
  optimization, or with a lesser optimization level.  Ideally this
  would be done by configure.bat if it detects the broken compiler
  version.

* Change the Makefile to simply not use optimization by default.  This
  is suboptimal, but would not be a big problem for Wget in practice
  -- the person creating the binaries would use optimization in his
  build, which means most people would still have access to an
  optimized Wget.

 Not yet, but I will certainly.  Nevertheless, I think the point is
 the continue to support existing installation if possble issue,
 after all VC6 is not free either, and at least one newer commercial
 VC version has been reported to compile without problems. Those,
 however, certainly don't support Win95, probably don't Win98/ME
 or/and NT4 either (didn't yet check though).

You mean that you cannot use later versions of C++ to produce
Win95/Win98/NT4 binaries?  I'd be very surprised if that were the
case!

 Personally I feel wget should try to still support that not-so-old
 compiler platform if possible,

Sure, but in this case some of the burden falls on the user of the
obsolete platform: he has to turn off optimization to avoid a bug in
his compiler.  That is not entirely unacceptable.

Re: wget 1.10 alpha 1

2005-04-15 Thread Karsten Hopp

Hi,

Does anybody know if the security vulnerabilities CAN-2004-1487 and
CAN-2004-1488 will be fixed in the new version ?
There seems to be at least some truth in the reports (ignore the insulting
tone of the reports).

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1487
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1488

  Karsten

Re: wget 1.10 alpha 1

2005-04-15 Thread Hrvoje Niksic

Karsten Hopp [EMAIL PROTECTED] writes:

 Does anybody know if the security vulnerabilities CAN-2004-1487 and
 CAN-2004-1488 will be fixed in the new version ?

Yes on both counts.

 There seems to be at least some truth in the reports (ignore the
 insulting tone of the reports).

 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1487
 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-1488

I've read them.  The first one is fairly improbable because it
requires special DNS setup for .. to resolve to an IP address.  The
second one poses a real problem, which I simply never considered.

I'm not sure if either issue is critical enough to warrant a 1.9.2
release.  The proximity of 1.10, which fixes both problems, makes it
unnecessary.

Re: wget 1.10 alpha 2

2005-04-14 Thread Hrvoje Niksic

Hrvoje Niksic [EMAIL PROTECTED] writes:

 [EMAIL PROTECTED] writes:

 If possible, it seems preferable to me to use the platform's C
 library regex support rather than make wget dependent on another
 library...

 Note that some platforms don't have library support for regexps, so
 we'd have to bundle anyway.

Oh, and POSIX regexps don't support -- and never will -- non-greedy
quantifiers, which are perhaps the most useful single additions of
Perl 5 regexps.

Incidentally, regex.c bundled with GNU Emacs supports them, along with
non-capturing (shy) groups, another very useful feature.

Re: wget 1.10 alpha 2

2005-04-14 Thread Mauro Tortonesi

On Wednesday 13 April 2005 07:39 am, Herold Heiko wrote:
 With MS Visual Studio 6 still needs attached patch in order to compile
 (disable optimization for part of http.c and retr.c if cl.exe version
 =12).

 Windows msvc test binary at http://xoomer.virgilio.it/hherold/

hi herold,

the patch you've posted is really such an ugly workaround (shame on microsoft 
and their freaking compilers) that i am not very willing to merge it into our 
cvs. 

have you tried the microsoft visual c++ toolkit 2003? maybe it works. you can 
download it for free at the following URL:

http://msdn.microsoft.com/visualc/vctoolkit2003/

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 alpha 1

2005-04-13 Thread Hrvoje Niksic

[EMAIL PROTECTED] (Steven M. Schweda) writes:

   #define VERSION_STRING 1.10-alpha1_sms1

 Was there any reason to do this with a source module instead of a
 simple macro in a simple header file?

At some point that approach made it easy to read or change the
version, as the script dist-wget does.  But I'm sure there are other
ways to do it, too.

Was there any reason to use '#include config.h' instead of
 '#include config.h'?

Yes.  The idea is that you can build in a separate directory and have
the compiler find the build directory's config.h instead of a config.h
previously configured in the source directory.  Quoting Autoconf
manual:

Use `#include config.h' instead of `#include config.h', and
pass the C compiler a `-I.' option (or `-I..'; whichever directory
contains `config.h'). That way, even if the source directory is
configured itself (perhaps to make a distribution), other build
directories can also be configured without finding the `config.h'
from the source directory.

RE: wget 1.10 alpha 2

2005-04-13 Thread Herold Heiko

With MS Visual Studio 6 still needs attached patch in order to compile
(disable optimization for part of http.c and retr.c if cl.exe version =12).

Windows msvc test binary at http://xoomer.virgilio.it/hherold/

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 13, 2005 12:36 AM
 To: wget@sunsite.dk; [EMAIL PROTECTED]
 Cc: Johannes Hoff; Leonid Petrov; Doug Kaufman; Tobias Tiederle; Jim
 Wright; garycao; Steven M.Schweda
 Subject: wget 1.10 alpha 2
 
 
 
 dear friends,
 
 i have just released the second alpha version of wget 1.10:
[snip]



20050413.diff
Description: Binary data

Re: wget 1.10 alpha 2

2005-04-13 Thread Hrvoje Niksic

[EMAIL PROTECTED] writes:

 If possible, it seems preferable to me to use the platform's C
 library regex support rather than make wget dependent on another
 library...

Note that some platforms don't have library support for regexps, so
we'd have to bundle anyway.

Re: wget 1.10 alpha 1

2005-04-12 Thread Steven M. Schweda

From: Mauro Tortonesi [EMAIL PROTECTED]

 [...] i think 
 that if you want your patches to be merged in our CVS, you should follow the 
 official patch submission procedure (that is, posting your patches to the 
 wget-patches AT sunsite DOT dk mailing list. each post should include a brief 
 comment about what the patch does, and especially why it does so). this would 
 save a lot of time to me and hrvoje and would definitely speed up the merging 
 process.
 [...]

   Perhaps.  I'll give it a try.

   Also, am I missing something obvious, or should the configure script
(as in, To configure Wget, run the configure script provided with the
distribution.) be somewhere in the CVS source?  I see many of its
relatives, but not the script itself.

   And I'm just getting started, but is there any good reason for the
extern variables output_stream and output_stream_regular not to be
declared in some header file?



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget 1.10 alpha 1

2005-04-12 Thread Doug Kaufman

On Tue, 12 Apr 2005, Steven M. Schweda wrote:

Also, am I missing something obvious, or should the configure script
 (as in, To configure Wget, run the configure script provided with the
 distribution.) be somewhere in the CVS source?  I see many of its
 relatives, but not the script itself.

You can use Makefile.cvs (i.e. make -f Makefile.cvs), which will
run autoheader and autoconf. The autoheader command creates
src/config.h.in and the autoconf command creates configure from
configure.in. I usually just run autoheader and autoconf directly. You
need to have Autoconf and m4 installed.
  Doug

-- 
Doug Kaufman
Internet: [EMAIL PROTECTED]

Re: wget 1.10 alpha 1

2005-04-12 Thread Hrvoje Niksic

[EMAIL PROTECTED] (Steven M. Schweda) writes:

 Also, am I missing something obvious, or should the configure script
 (as in, To configure Wget, run the configure script provided with
 the distribution.) be somewhere in the CVS source?

The configure script is auto-generated and is therefore not in CVS.
To get it, run autoconf.  See the file README.cvs.

 And I'm just getting started, but is there any good reason for the
 extern variables output_stream and output_stream_regular not to be
 declared in some header file?

No good reason that I can think of.

Re: Wget error

2005-04-12 Thread Mauro Tortonesi

On Tuesday 12 April 2005 06:17 pm, Jeanne McIlvain wrote:
 Hi!
   I attempted to download wget onto my mac. I was disappointed to find
 that it would not work. I thought that I read it was applicable to
 macs, but am I wrong? Please let me know, Thank you so much.
 - please respond to [EMAIL PROTECTED]

did you download the source tarball and compile it? which version of wget are 
you using? which version of mac os?

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human  Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it

Re: wget 1.10 alpha 1

2005-04-12 Thread Steven M. Schweda

From: Hrvoje Niksic [EMAIL PROTECTED]

  Also, am I missing something obvious, or should the configure script
  (as in, To configure Wget, run the configure script provided with
  the distribution.) be somewhere in the CVS source?
 
 The configure script is auto-generated and is therefore not in CVS.
 To get it, run autoconf.  See the file README.cvs.

   Sorry for the stupid question.  I was reading the right document but
then I got distracted and failed to get back to it.  Thanks for the
quick, helpful responses.

  And I'm just getting started, but is there any good reason for the
  extern variables output_stream and output_stream_regular not to be
  declared in some header file?
 
 No good reason that I can think of.

   I'm busy segregating all/most of the VMS-specific stuff into a vms
directory, to annoy the normal folks less.

   Currently, I have output_stream, output_stream_regular, and
total_downloaded_bytes in (a new) main.h, but I could do something else
if there's a better plan.

   Rather than do something similar for version_string, I just
transformed version.c into version.h, which (for the moment) contains
little other than:

  #define VERSION_STRING 1.10-alpha1_sms1

Was there any reason to do this with a source module instead of a simple
macro in a simple header file?

   Was there any reason to use '#include config.h' instead of
'#include config.h'?  This hosed my original automatic dependency
generation, but a work-around was easy enough.  It just seemed like a
difference from all the other non-system inclusions with no obvious (to
me) reason.

   Currently, I'm working from a CVS collection taken on 11 April. 
Assuming I can get this stuff organized in the next few days or so, what
would be the most convenient code base to use?



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547

Re: wget 1.9.1 with large DVD.iso files

2005-04-11 Thread Hrvoje Niksic

Sanjay Madhavan [EMAIL PROTECTED] writes:

 wget 1.9.1 fails when trying to download a very large file.
  
 The download stopped in between and attempting to resume shows a negative
 sized balance to be downloaded.
  
 e.g.ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso  
 3284710 KB
  
 I read somewhere that it is due to the fact that internally the size
 is being stored as signed integers and hence the numbers wrap around
 giving negative sizes for large (DVD sized files)

That is correct.  But this problem has been fixed in the current CVS.
If you know how to use CVS, you can download it (the instructions are
at http://wget.sunsite.dk/) and give it a spin.  Downloading that file
should work in that version:

{mulj}[~]$ wget 
ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso
--11:44:46--  
ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso
   = `SUSE-Linux-9.2-FTP-DVD.iso'
Resolving ftp.solnet.ch... 212.101.4.244
Connecting to ftp.solnet.ch|212.101.4.244|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /mirror/SuSE/i386/9.2/iso ... done.
== PASV ... done.== RETR SUSE-Linux-9.2-FTP-DVD.iso ... done.
Length: 3,363,543,040 (3.1G) (unauthoritative)

 0% [ ] 146,464   37.06K/s 
ETA 24:32:50
...

Re: wget 1.9.1 with large DVD.iso files

2005-04-11 Thread Bryan

I may run into this in the future.  What is the threshold for large
files failing on the -current version of wget???  I'm not expecting to
d/l anything over 200MB, but is that even too large for it?

Sorry to threadjack, but it seemed an appropiate question...

Bryan

On Apr 11, 2005 2:46 AM, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Sanjay Madhavan [EMAIL PROTECTED] writes:
 
  wget 1.9.1 fails when trying to download a very large file.
 
  The download stopped in between and attempting to resume shows a negative
  sized balance to be downloaded.
 
  e.g.ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso
  3284710 KB
 
  I read somewhere that it is due to the fact that internally the size
  is being stored as signed integers and hence the numbers wrap around
  giving negative sizes for large (DVD sized files)
 
 That is correct.  But this problem has been fixed in the current CVS.
 If you know how to use CVS, you can download it (the instructions are
 at http://wget.sunsite.dk/) and give it a spin.  Downloading that file
 should work in that version:
 
 {mulj}[~]$ wget 
 ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso
 --11:44:46--  
 ftp://ftp.solnet.ch/mirror/SuSE/i386/9.2/iso/SUSE-Linux-9.2-FTP-DVD.iso
= `SUSE-Linux-9.2-FTP-DVD.iso'
 Resolving ftp.solnet.ch... 212.101.4.244
 Connecting to ftp.solnet.ch|212.101.4.244|:21... connected.
 Logging in as anonymous ... Logged in!
 == SYST ... done.== PWD ... done.
 == TYPE I ... done.  == CWD /mirror/SuSE/i386/9.2/iso ... done.
 == PASV ... done.== RETR SUSE-Linux-9.2-FTP-DVD.iso ... done.
 Length: 3,363,543,040 (3.1G) (unauthoritative)
 
  0% [ ] 146,464   
 37.06K/s ETA 24:32:50
 ...

Re: wget 1.9.1 with large DVD.iso files

2005-04-11 Thread Hrvoje Niksic

Bryan [EMAIL PROTECTED] writes:

 I may run into this in the future.  What is the threshold for large
 files failing on the -current version of wget???

The threshold is 2G (2147483648 bytes).

 I'm not expecting to d/l anything over 200MB, but is that even too
 large for it?

That's not too large.  OP's file was over 3G.

Re: wget follow-excluded patch

2005-04-10 Thread Hrvoje Niksic

Tobias Tiederle [EMAIL PROTECTED] writes:

 let's say you have the following structure:

 index.html
 |-cool.html
 |  |-page1.html
 |  |-page2.html
 |  |-  ...
 |
 |-crap.html
|-page1.html
|-page2.html

 now you want to download the whole structure, but you want to
 exclude the crap (with -R/A or nice regex).  If you look at recur.c,
 crap.html is downloaded (and deleted), but all the pages linked in
 crap.html will be downloaded as well.  With the option I included,
 all the crap will be totally ignored.  I don't know how to achieve
 this beahaviour with the current options.

You can't.  -R/-A were never meant to be used that way -- witness the
FTP code, where they're not applied to directories either.  (In this
sense HTML files are directories of a kind.)

Maybe we could repurpose -I/-X so they can apply to HTML files and be
used to ignore whole sub-hierarchies of the site?  Although a bit
unorthodox, that would be very much within the jurisdiction of those
options.

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1221 matches

Mail list logo