follow_ftp not work

2003-11-17 Thread Sergey Vasilevsky
Wget 1.9.1

.wgetrc:
reject = *.[Ee][xX][Ee]*
follow_ftp = off

Command line:
wget -np -nv -r -N -nH --referer=http://www.orion.by  -P
/tmp/www.orion.by -D orion.by  http://www.orion.by

Output:
Last-modified header missing -- time-stamps turned off.
13:15:08 URL:http://www.orion.by/index.php?mode=main [24703] ->
"/tmp/www.orion.by/index.php?mode=main" [1]
http://www.orion.by/robots.txt:
13:15:09 ERROR 404: Not Found.
20 redirections exceeded.
20 redirections exceeded.
13:15:18 URL:
ftp://62.118.248.95/cyberfight/q3/utils/Seismovision222light.exe [882] ->
"/tmp/www.orion.by/cyberfight/q3/utils/.listing" [1]
^C

Question:
1. How I can see what parameters wget use at run time?
   You may add some option for print it.
2. Reject rules require more help with examples!





RE: Wget 1.8.2 bug

2003-10-20 Thread Sergey Vasilevsky
Thanks for explain this reasons.

And I have anoter problem:
in .wgetrc I use
reject =
*.[zZ][iI][pP]*,*.[rR][aA][rR]*,*.[gG][iI][fF]*,*.[jJ][pP][gG]*,*.[Ee][xX][E
e]*,*[=]http*
accept =
*.yp*,*.pl*,*.dll*,*.nsf*,*.[hH][tT][mM]*,*.[pPsSjJ][hH][tT][mM]*,*.[pP][hH]
[pP]*,*.[jJ][sS][pP]*,*.[tT][xX][tT],*.[cC][gG][iI]*,*.[cC][sS][pP]*,*.[aA][
sS][pP]*,*[?]*

In command line add some more rules '-R xxx' - I think it joined with
previos rules.
And use recursive download.

In result I found *.zip and *.exe ...  files!
What I do wrong?

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 17, 2003 7:18 PM
> To: Tony Lewis
> Cc: Wget List
> Subject: Re: Wget 1.8.2 bug
>
>
> "Tony Lewis" <[EMAIL PROTECTED]> writes:
>
> > Hrvoje Niksic wrote:
> >
> >> Incidentally, Wget is not the only browser that has a problem with
> >> that.  For me, Mozilla is simply showing the source of
> >> , because
> >> the returned content-type is text/plain.
> >
> > On the other hand, Internet Explorer will treat lots of content
> > types as HTML if the content starts with "".
>
> I know.  But so far noone has asked for this in Wget.
>
> > Perhaps we can add an option to wget so that it will look for an
> >  tag in plain text files?
>
> If more people clamor for the option, I suppose we could overload
> `--force-html' to perform such detection.
>



RE: Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I think wget strong verify link syntax:

That link have incorrect symbol ';' not quoted in .

> -Original Message-
> From: Sergey Vasilevsky [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 16, 2003 10:15 AM
> To: [EMAIL PROTECTED]
> Subject: Problem recursive download
>
>
> I use wget 1.8.2
> Try recursive downdload www.map-by.info/index.html, but wget stop in first
> page.
> Why?
> index.html have links to another page.
>
> /usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
> /tmp/www.map-by.info -D map-by.info http://map-by.info
> http://www.map-by.info
> --10:09:25--  http://map-by.info/
>=> `/p4/poisk/spider/resource/www.map-by.info/index.html'
> Resolving proxy.open.by... done.
> Connecting to proxy.open.by[193.232.92.3]:8080... connected.
> Proxy request sent, awaiting response... 200 OK
> Length: ignored [text/html]
> Server file no newer than local file
> `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.
>
> --10:09:25--  http://www.map-by.info/
>=> `/p4/poisk/spider/resource/www.map-by.info/index.html'
> Connecting to proxy.open.by[193.232.92.3]:8080... connected.
> Proxy request sent, awaiting response... 200 OK
> Length: ignored [text/html]
> Server file no newer than local file
> `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.
>
>
> FINISHED --10:09:26--
> Downloaded: 0 bytes in 0 files
>
>



Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I use wget 1.8.2
Try recursive downdload www.map-by.info/index.html, but wget stop in first
page.
Why?
index.html have links to another page.

/usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
/tmp/www.map-by.info -D map-by.info http://map-by.info
http://www.map-by.info
--10:09:25--  http://map-by.info/
   => `/p4/poisk/spider/resource/www.map-by.info/index.html'
Resolving proxy.open.by... done.
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.

--10:09:25--  http://www.map-by.info/
   => `/p4/poisk/spider/resource/www.map-by.info/index.html'
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.


FINISHED --10:09:26--
Downloaded: 0 bytes in 0 files



Question about url convert

2003-10-14 Thread Sergey Vasilevsky
Have wget any rules to convert retrive url to store url?
Or may be in future?

For example:
Get -> site.com/index.php?PHPSESSID=123124324 
Filter -> /PHPSESSID=[a-z0-9]+//i
Save as -> site.com/index.php


Wget 1.8.2 bug

2003-10-14 Thread Sergey Vasilevsky
I use wget 1.8.2.
When I try recursive download site site.com where
site.com/ first page redirect to site.com/xxx.html that have first link in
the page to site.com/
then Wget download only xxx.html and stop.
Other links from xxx.html not followed!



problem with 302 server respose parsing

2003-10-08 Thread Sergey Vasilevsky
I use Wget 1.8.2.
When I try receive page with '-nc' option and server return 302 and new url,
wget not test that url on rules in '-nc' and download and rewrite existing
file.

I think wget not used command line option rules when parse server response
header!
It is a bug?



no-clobber add more suffix

2003-10-06 Thread Sergey Vasilevsky
`--no-clobber' is very usfull option, but i retrive document not only with
.html/.htm suffix.

Make addition option that like -A/-R define all allowed/rejected rules
for -nc option.



problem use accept/reject rules

2003-09-15 Thread Sergey Vasilevsky
I use that rule in .wgetrc:
accept = *[?]*
and
reject = *\.[zZ][iI][pP]*

I think that rule exclude all *.zip* from download, but in test url like
http://domain.com/price/source/10-20030915.zip?PHPSESSID=0cd4eb0801c656a292e
33c9b8134c899
downloaded.

Whats wrong wget or my regexp?