RE: Wget 1.8.2 bug

2003-10-20 Thread Sergey Vasilevsky
Thanks for explain this reasons.

And I have anoter problem:
in .wgetrc I use
reject =
*.[zZ][iI][pP]*,*.[rR][aA][rR]*,*.[gG][iI][fF]*,*.[jJ][pP][gG]*,*.[Ee][xX][E
e]*,*[=]http*
accept =
*.yp*,*.pl*,*.dll*,*.nsf*,*.[hH][tT][mM]*,*.[pPsSjJ][hH][tT][mM]*,*.[pP][hH]
[pP]*,*.[jJ][sS][pP]*,*.[tT][xX][tT],*.[cC][gG][iI]*,*.[cC][sS][pP]*,*.[aA][
sS][pP]*,*[?]*

In command line add some more rules '-R xxx' - I think it joined with
previos rules.
And use recursive download.

In result I found *.zip and *.exe ...  files!
What I do wrong?

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 17, 2003 7:18 PM
 To: Tony Lewis
 Cc: Wget List
 Subject: Re: Wget 1.8.2 bug


 Tony Lewis [EMAIL PROTECTED] writes:

  Hrvoje Niksic wrote:
 
  Incidentally, Wget is not the only browser that has a problem with
  that.  For me, Mozilla is simply showing the source of
  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
  the returned content-type is text/plain.
 
  On the other hand, Internet Explorer will treat lots of content
  types as HTML if the content starts with html.

 I know.  But so far noone has asked for this in Wget.

  Perhaps we can add an option to wget so that it will look for an
  html tag in plain text files?

 If more people clamor for the option, I suppose we could overload
 `--force-html' to perform such detection.




Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
??? ?? [EMAIL PROTECTED] writes:

 I've seen pages that do that kind of redirections, but Wget seems
 to follow them, for me.  Do you have an example I could try?

 [EMAIL PROTECTED]:~/ /usr/local/bin/wget -U
 All.by  -np -r -N -nH --header=Accept-Charset: cp1251, windows-1251, win,
 x-cp1251, cp-1251 --referer=http://minskshop.by  -P /tmp/minskshop.by -D
 minskshop.by http://minskshop.by http://www.minskshop.by
[...]

The problem with these pages lies not in redirection, but in the fact
that the server returns them with the `text/plain' content-type
instead of `text/html', which Wget requires in order to treat a page
as HTML.

Observe:

 --13:05:47--  http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:05:53--  http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:05:59--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:06:00--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]

Incidentally, Wget is not the only browser that has a problem with
that.  For me, Mozilla is simply showing the source of
http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
the returned content-type is text/plain.


Re: Wget 1.8.2 bug

2003-10-17 Thread Tony Lewis
Hrvoje Niksic wrote:

 Incidentally, Wget is not the only browser that has a problem with
 that.  For me, Mozilla is simply showing the source of
 http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
 the returned content-type is text/plain.

On the other hand, Internet Explorer will treat lots of content types as
HTML if the content starts with html.

To see for yourself, try these links:
http://www.exelana.com/test.cgi
http://www.exelana.com/test.cgi?text/plain
http://www.exelana.com/test.cgi?image/jpeg

Perhaps we can add an option to wget so that it will look for an html tag
in plain text files?

Tony



Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 Hrvoje Niksic wrote:

 Incidentally, Wget is not the only browser that has a problem with
 that.  For me, Mozilla is simply showing the source of
 http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
 the returned content-type is text/plain.

 On the other hand, Internet Explorer will treat lots of content
 types as HTML if the content starts with html.

I know.  But so far noone has asked for this in Wget.

 Perhaps we can add an option to wget so that it will look for an
 html tag in plain text files?

If more people clamor for the option, I suppose we could overload
`--force-html' to perform such detection.


Wget 1.8.2 bug

2003-10-14 Thread Sergey Vasilevsky
I use wget 1.8.2.
When I try recursive download site site.com where
site.com/ first page redirect to site.com/xxx.html that have first link in
the page to site.com/
then Wget download only xxx.html and stop.
Other links from xxx.html not followed!



Re: Wget 1.8.2 bug

2003-10-14 Thread Hrvoje Niksic
Sergey Vasilevsky [EMAIL PROTECTED] writes:

 I use wget 1.8.2.  When I try recursive download site site.com where
 site.com/ first page redirect to site.com/xxx.html that have first
 link in the page to site.com/ then Wget download only xxx.html and
 stop.  Other links from xxx.html not followed!

I've seen pages that do that kind of redirections, but Wget seems to
follow them, for me.  Do you have an example I could try?


wget 1.8.2 bug

2002-10-19 Thread Curtis H. Wilbar Jr.

I have found that -k option does not work on downloaded ftp files.

The key problem seems to be that register_download is never called on
ftp files downloaded as local_file is never set for calls to ftp_loop
like they are on calls to http_loop.

So, I added local_file as a parameter to ftp_loop and used con.target
within the ftp_loop function in order to save this information such
that it get's retistered so it will be rewritten.

The diffs are for files ftp.c, ftp.h, and retr.c.

I have not studied the impact on further logic on the change.

The one thing I am unsure of is the following if statement
(line 428 of wget 1.8.2's retr.c file):

  if (redirections  local_file  u-scheme == SCHEME_FTP)

If local_file here relied on a setting in from prior logic before the
ftp_loop call, then that value would be gone, and this logic would
therefore would likely not work as intended.  

Your input on these changes would be appreciated.  All I ask is that if
this change be used that I get a mention in the ChangeLog.

I'm still looking into any further impacts these code changes would
have, but initially it looks OK.

Thanks,

  -- Curt
  
Here are the diffs:

*** ftp.c   Fri May 17 23:05:16 2002
--- ../../wget-1.8.2.cw/src/ftp.c   Sat Oct 19 13:14:06 2002
***
*** 1637,1643 
 of URL.  Inherently, its capabilities are limited on what can be
 encoded into a URL.  */
  uerr_t
! ftp_loop (struct url *u, int *dt)
  {
ccon con;   /* FTP connection */
uerr_t res;
--- 1637,1643 
 of URL.  Inherently, its capabilities are limited on what can be
 encoded into a URL.  */
  uerr_t
! ftp_loop (struct url *u, char **local_file, int *dt)
  {
ccon con;   /* FTP connection */
uerr_t res;
***
*** 1716,1723 
  CLOSE (RBUF_FD (con.rbuf));
FREE_MAYBE (con.id);
con.id = NULL;
!   FREE_MAYBE (con.target);
!   con.target = NULL;
return res;
  }
  
--- 1716,1730 
  CLOSE (RBUF_FD (con.rbuf));
FREE_MAYBE (con.id);
con.id = NULL;
!   if (res == RETROK)
!   {
! *local_file = con.target;
!   }
!   else
!   {
! FREE_MAYBE (con.target);
! con.target = NULL;
!   }
return res;
  }


*** ftp.h   Sat May 18 23:04:53 2002
--- ../../wget-1.8.2.cw/src/ftp.h   Sat Oct 19 13:13:36 2002
***
*** 109,115 
  };
  
  struct fileinfo *ftp_parse_ls PARAMS ((const char *, const enum stype));
! uerr_t ftp_loop PARAMS ((struct url *, int *));
  
  uerr_t ftp_index PARAMS ((const char *, struct url *, struct fileinfo *));
  
--- 109,115 
  };
  
  struct fileinfo *ftp_parse_ls PARAMS ((const char *, const enum stype));
! uerr_t ftp_loop PARAMS ((struct url *, char **, int *));
  
  uerr_t ftp_index PARAMS ((const char *, struct url *, struct fileinfo *));



*** retr.c  Fri May 17 23:05:21 2002
--- ../../wget-1.8.2.cw/src/retr.c  Sat Oct 19 12:58:34 2002
***
*** 418,424 
int oldrec = opt.recursive;
if (redirections)
opt.recursive = 0;
!   result = ftp_loop (u, dt);
opt.recursive = oldrec;
  
/* There is a possibility of having HTTP being redirected to
--- 418,424 
int oldrec = opt.recursive;
if (redirections)
opt.recursive = 0;
!   result = ftp_loop (u, local_file, dt);
opt.recursive = oldrec;
  
/* There is a possibility of having HTTP being redirected to

Curtis H. Wilbar Jr.
Hawk Mountain Networks
[EMAIL PROTECTED]