problem use accept/reject rules

2003-09-15 Thread Sergey Vasilevsky
I use that rule in .wgetrc:
accept = *[?]*
and
reject = *\.[zZ][iI][pP]*

I think that rule exclude all *.zip* from download, but in test url like
http://domain.com/price/source/10-20030915.zip?PHPSESSID=0cd4eb0801c656a292e
33c9b8134c899
downloaded.

Whats wrong wget or my regexp?



no-clobber add more suffix

2003-10-06 Thread Sergey Vasilevsky
`--no-clobber' is very usfull option, but i retrive document not only with
.html/.htm suffix.

Make addition option that like -A/-R define all allowed/rejected rules
for -nc option.



problem with 302 server respose parsing

2003-10-08 Thread Sergey Vasilevsky
I use Wget 1.8.2.
When I try receive page with '-nc' option and server return 302 and new url,
wget not test that url on rules in '-nc' and download and rewrite existing
file.

I think wget not used command line option rules when parse server response
header!
It is a bug?



Wget 1.8.2 bug

2003-10-14 Thread Sergey Vasilevsky
I use wget 1.8.2.
When I try recursive download site site.com where
site.com/ first page redirect to site.com/xxx.html that have first link in
the page to site.com/
then Wget download only xxx.html and stop.
Other links from xxx.html not followed!



Question about url convert

2003-10-14 Thread Sergey Vasilevsky
Have wget any rules to convert retrive url to store url?
Or may be in future?

For example:
Get - site.com/index.php?PHPSESSID=123124324 
Filter - /PHPSESSID=[a-z0-9]+//i
Save as - site.com/index.php


Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I use wget 1.8.2
Try recursive downdload www.map-by.info/index.html, but wget stop in first
page.
Why?
index.html have links to another page.

/usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
/tmp/www.map-by.info -D map-by.info http://map-by.info
http://www.map-by.info
--10:09:25--  http://map-by.info/
   = `/p4/poisk/spider/resource/www.map-by.info/index.html'
Resolving proxy.open.by... done.
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.

--10:09:25--  http://www.map-by.info/
   = `/p4/poisk/spider/resource/www.map-by.info/index.html'
Connecting to proxy.open.by[193.232.92.3]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: ignored [text/html]
Server file no newer than local file
`/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.


FINISHED --10:09:26--
Downloaded: 0 bytes in 0 files



RE: Problem recursive download

2003-10-16 Thread Sergey Vasilevsky
I think wget strong verify link syntax:
a href=about_rus.html onMouseOver=img_on('main21');
onMouseOut=img_off('main21')
That link have incorrect symbol ';' not quoted in a 

 -Original Message-
 From: Sergey Vasilevsky [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2003 10:15 AM
 To: [EMAIL PROTECTED]
 Subject: Problem recursive download


 I use wget 1.8.2
 Try recursive downdload www.map-by.info/index.html, but wget stop in first
 page.
 Why?
 index.html have links to another page.

 /usr/local/bin/wget -np -r -N -nH --referer=http://map-by.info  -P
 /tmp/www.map-by.info -D map-by.info http://map-by.info
 http://www.map-by.info
 --10:09:25--  http://map-by.info/
= `/p4/poisk/spider/resource/www.map-by.info/index.html'
 Resolving proxy.open.by... done.
 Connecting to proxy.open.by[193.232.92.3]:8080... connected.
 Proxy request sent, awaiting response... 200 OK
 Length: ignored [text/html]
 Server file no newer than local file
 `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.

 --10:09:25--  http://www.map-by.info/
= `/p4/poisk/spider/resource/www.map-by.info/index.html'
 Connecting to proxy.open.by[193.232.92.3]:8080... connected.
 Proxy request sent, awaiting response... 200 OK
 Length: ignored [text/html]
 Server file no newer than local file
 `/p4/poisk/spider/resource/www.map-by.info/index.html' -- not retrieving.


 FINISHED --10:09:26--
 Downloaded: 0 bytes in 0 files





RE: Wget 1.8.2 bug

2003-10-20 Thread Sergey Vasilevsky
Thanks for explain this reasons.

And I have anoter problem:
in .wgetrc I use
reject =
*.[zZ][iI][pP]*,*.[rR][aA][rR]*,*.[gG][iI][fF]*,*.[jJ][pP][gG]*,*.[Ee][xX][E
e]*,*[=]http*
accept =
*.yp*,*.pl*,*.dll*,*.nsf*,*.[hH][tT][mM]*,*.[pPsSjJ][hH][tT][mM]*,*.[pP][hH]
[pP]*,*.[jJ][sS][pP]*,*.[tT][xX][tT],*.[cC][gG][iI]*,*.[cC][sS][pP]*,*.[aA][
sS][pP]*,*[?]*

In command line add some more rules '-R xxx' - I think it joined with
previos rules.
And use recursive download.

In result I found *.zip and *.exe ...  files!
What I do wrong?

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 17, 2003 7:18 PM
 To: Tony Lewis
 Cc: Wget List
 Subject: Re: Wget 1.8.2 bug


 Tony Lewis [EMAIL PROTECTED] writes:

  Hrvoje Niksic wrote:
 
  Incidentally, Wget is not the only browser that has a problem with
  that.  For me, Mozilla is simply showing the source of
  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
  the returned content-type is text/plain.
 
  On the other hand, Internet Explorer will treat lots of content
  types as HTML if the content starts with html.

 I know.  But so far noone has asked for this in Wget.

  Perhaps we can add an option to wget so that it will look for an
  html tag in plain text files?

 If more people clamor for the option, I suppose we could overload
 `--force-html' to perform such detection.




follow_ftp not work

2003-11-17 Thread Sergey Vasilevsky
Wget 1.9.1

.wgetrc:
reject = *.[Ee][xX][Ee]*
follow_ftp = off

Command line:
wget -np -nv -r -N -nH --referer=http://www.orion.by  -P
/tmp/www.orion.by -D orion.by  http://www.orion.by

Output:
Last-modified header missing -- time-stamps turned off.
13:15:08 URL:http://www.orion.by/index.php?mode=main [24703] -
/tmp/www.orion.by/index.php?mode=main [1]
http://www.orion.by/robots.txt:
13:15:09 ERROR 404: Not Found.
20 redirections exceeded.
20 redirections exceeded.
13:15:18 URL:
ftp://62.118.248.95/cyberfight/q3/utils/Seismovision222light.exe [882] -
/tmp/www.orion.by/cyberfight/q3/utils/.listing [1]
^C

Question:
1. How I can see what parameters wget use at run time?
   You may add some option for print it.
2. Reject rules require more help with examples!