RE: Wget 1.8.2 bug

2003-10-20 Thread Sergey Vasilevsky
Thanks for explain this reasons.

And I have anoter problem:
in .wgetrc I use
reject =
*.[zZ][iI][pP]*,*.[rR][aA][rR]*,*.[gG][iI][fF]*,*.[jJ][pP][gG]*,*.[Ee][xX][E
e]*,*[=]http*
accept =
*.yp*,*.pl*,*.dll*,*.nsf*,*.[hH][tT][mM]*,*.[pPsSjJ][hH][tT][mM]*,*.[pP][hH]
[pP]*,*.[jJ][sS][pP]*,*.[tT][xX][tT],*.[cC][gG][iI]*,*.[cC][sS][pP]*,*.[aA][
sS][pP]*,*[?]*

In command line add some more rules '-R xxx' - I think it joined with
previos rules.
And use recursive download.

In result I found *.zip and *.exe ...  files!
What I do wrong?

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 17, 2003 7:18 PM
> To: Tony Lewis
> Cc: Wget List
> Subject: Re: Wget 1.8.2 bug
>
>
> "Tony Lewis" <[EMAIL PROTECTED]> writes:
>
> > Hrvoje Niksic wrote:
> >
> >> Incidentally, Wget is not the only browser that has a problem with
> >> that.  For me, Mozilla is simply showing the source of
> >> <http://www.minskshop.by/cgi-bin/shop.cgi?id=1&cookie=set>, because
> >> the returned content-type is text/plain.
> >
> > On the other hand, Internet Explorer will treat lots of content
> > types as HTML if the content starts with "".
>
> I know.  But so far noone has asked for this in Wget.
>
> > Perhaps we can add an option to wget so that it will look for an
> >  tag in plain text files?
>
> If more people clamor for the option, I suppose we could overload
> `--force-html' to perform such detection.
>



Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
"Tony Lewis" <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>
>> Incidentally, Wget is not the only browser that has a problem with
>> that.  For me, Mozilla is simply showing the source of
>> , because
>> the returned content-type is text/plain.
>
> On the other hand, Internet Explorer will treat lots of content
> types as HTML if the content starts with "".

I know.  But so far noone has asked for this in Wget.

> Perhaps we can add an option to wget so that it will look for an
>  tag in plain text files?

If more people clamor for the option, I suppose we could overload
`--force-html' to perform such detection.


Re: Wget 1.8.2 bug

2003-10-17 Thread Tony Lewis
Hrvoje Niksic wrote:

> Incidentally, Wget is not the only browser that has a problem with
> that.  For me, Mozilla is simply showing the source of
> , because
> the returned content-type is text/plain.

On the other hand, Internet Explorer will treat lots of content types as
HTML if the content starts with "".

To see for yourself, try these links:
http://www.exelana.com/test.cgi
http://www.exelana.com/test.cgi?text/plain
http://www.exelana.com/test.cgi?image/jpeg

Perhaps we can add an option to wget so that it will look for an  tag
in plain text files?

Tony



Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
"??? ??" <[EMAIL PROTECTED]> writes:

>> I've seen pages that do that kind of redirections, but Wget seems
>> to follow them, for me.  Do you have an example I could try?
>>
> [EMAIL PROTECTED]:~/> /usr/local/bin/wget -U
> "All.by"  -np -r -N -nH --header="Accept-Charset: cp1251, windows-1251, win,
> x-cp1251, cp-1251" --referer=http://minskshop.by  -P /tmp/minskshop.by -D
> minskshop.by http://minskshop.by http://www.minskshop.by
[...]

The problem with these pages lies not in redirection, but in the fact
that the server returns them with the `text/plain' content-type
instead of `text/html', which Wget requires in order to treat a page
as HTML.

Observe:

> --13:05:47--  http://minskshop.by/cgi-bin/shop.cgi?id=1&cookie=set
> Length: ignored [text/plain]
> --13:05:53--  http://minskshop.by/cgi-bin/shop.cgi?id=1&cookie=set
> Length: ignored [text/plain]
> --13:05:59--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1&cookie=set
> Length: ignored [text/plain]
> --13:06:00--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1&cookie=set
> Length: ignored [text/plain]

Incidentally, Wget is not the only browser that has a problem with
that.  For me, Mozilla is simply showing the source of
, because
the returned content-type is text/plain.


Re: Wget 1.8.2 bug

2003-10-14 Thread Hrvoje Niksic
"Sergey Vasilevsky" <[EMAIL PROTECTED]> writes:

> I use wget 1.8.2.  When I try recursive download site site.com where
> site.com/ first page redirect to site.com/xxx.html that have first
> link in the page to site.com/ then Wget download only xxx.html and
> stop.  Other links from xxx.html not followed!

I've seen pages that do that kind of redirections, but Wget seems to
follow them, for me.  Do you have an example I could try?