RE: Wget 1.8.2 bug
Thanks for explain this reasons. And I have anoter problem: in .wgetrc I use reject = *.[zZ][iI][pP]*,*.[rR][aA][rR]*,*.[gG][iI][fF]*,*.[jJ][pP][gG]*,*.[Ee][xX][E e]*,*[=]http* accept = *.yp*,*.pl*,*.dll*,*.nsf*,*.[hH][tT][mM]*,*.[pPsSjJ][hH][tT][mM]*,*.[pP][hH] [pP]*,*.[jJ][sS][pP]*,*.[tT][xX][tT],*.[cC][gG][iI]*,*.[cC][sS][pP]*,*.[aA][ sS][pP]*,*[?]* In command line add some more rules '-R xxx' - I think it joined with previos rules. And use recursive download. In result I found *.zip and *.exe ... files! What I do wrong? -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Friday, October 17, 2003 7:18 PM To: Tony Lewis Cc: Wget List Subject: Re: Wget 1.8.2 bug Tony Lewis [EMAIL PROTECTED] writes: Hrvoje Niksic wrote: Incidentally, Wget is not the only browser that has a problem with that. For me, Mozilla is simply showing the source of http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because the returned content-type is text/plain. On the other hand, Internet Explorer will treat lots of content types as HTML if the content starts with html. I know. But so far noone has asked for this in Wget. Perhaps we can add an option to wget so that it will look for an html tag in plain text files? If more people clamor for the option, I suppose we could overload `--force-html' to perform such detection.
Re: Wget 1.8.2 bug
??? ?? [EMAIL PROTECTED] writes: I've seen pages that do that kind of redirections, but Wget seems to follow them, for me. Do you have an example I could try? [EMAIL PROTECTED]:~/ /usr/local/bin/wget -U All.by -np -r -N -nH --header=Accept-Charset: cp1251, windows-1251, win, x-cp1251, cp-1251 --referer=http://minskshop.by -P /tmp/minskshop.by -D minskshop.by http://minskshop.by http://www.minskshop.by [...] The problem with these pages lies not in redirection, but in the fact that the server returns them with the `text/plain' content-type instead of `text/html', which Wget requires in order to treat a page as HTML. Observe: --13:05:47-- http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set Length: ignored [text/plain] --13:05:53-- http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set Length: ignored [text/plain] --13:05:59-- http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set Length: ignored [text/plain] --13:06:00-- http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set Length: ignored [text/plain] Incidentally, Wget is not the only browser that has a problem with that. For me, Mozilla is simply showing the source of http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because the returned content-type is text/plain.
Re: Wget 1.8.2 bug
Hrvoje Niksic wrote: Incidentally, Wget is not the only browser that has a problem with that. For me, Mozilla is simply showing the source of http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because the returned content-type is text/plain. On the other hand, Internet Explorer will treat lots of content types as HTML if the content starts with html. To see for yourself, try these links: http://www.exelana.com/test.cgi http://www.exelana.com/test.cgi?text/plain http://www.exelana.com/test.cgi?image/jpeg Perhaps we can add an option to wget so that it will look for an html tag in plain text files? Tony
Re: Wget 1.8.2 bug
Tony Lewis [EMAIL PROTECTED] writes: Hrvoje Niksic wrote: Incidentally, Wget is not the only browser that has a problem with that. For me, Mozilla is simply showing the source of http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because the returned content-type is text/plain. On the other hand, Internet Explorer will treat lots of content types as HTML if the content starts with html. I know. But so far noone has asked for this in Wget. Perhaps we can add an option to wget so that it will look for an html tag in plain text files? If more people clamor for the option, I suppose we could overload `--force-html' to perform such detection.
Re: Wget 1.8.2 bug
Sergey Vasilevsky [EMAIL PROTECTED] writes: I use wget 1.8.2. When I try recursive download site site.com where site.com/ first page redirect to site.com/xxx.html that have first link in the page to site.com/ then Wget download only xxx.html and stop. Other links from xxx.html not followed! I've seen pages that do that kind of redirections, but Wget seems to follow them, for me. Do you have an example I could try?