recursive download

2006-05-19 Thread Ajar Taknev

Hi,

I am trying to recursively download from an ftp site without success.
I am behind a squid proxy and I have setup the .wgetrc correctly. When
I do wget -r ftp:/ftp.somesite.com/dir it fetches
ftp.somesite.com/dir/index.html and exits. It doesn't do a recursive
download. When I do the same thing from a machine which is not behind
a proxy the same command does a recursive download. proxy is
squid-2.5.STABLE9 and wget version is  1.10.2. Any ideas what the
problem could be?

TIA,
Ajar.


Re: Exclude directorie

2006-05-19 Thread Mauro Tortonesi

Antoine Bonnefoy wrote:

Hy,
I found a bug with the option -X in recursive mode. When i use a wildcard,
in the exclude string, it's only works for one level string.
For example:
for this directory architecture :
server:
  =level1
   = Data
   = level2
= Data

wget -X */Data -r http://server/level1/
works correctly for exclude directory Data
but don't exclude the Data directory in level2

The bug come from the fnmatch function.
I correct it for me in the utils.c file with deactivate the flag
FNM_PATHNAME in the proclist() function.

Is it the right comportment?

I hope its help

Excuse me for my English


hi antoine,

could you please tell us which version of wget you are using? after the 
release of 1.10.2 i have merged a patch that fixed a few bugs in -X 
support, so you might want to try the current version of wget available 
from our subversion repository:


http://www.gnu.org/software/wget/wgetdev.html#development

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Redirect makes wget fetch another domain

2006-05-19 Thread Mauro Tortonesi

Equipe web wrote:

I've come across this annoying bug :
Even though wget is told not to span other hosts, it does when  
redirected !!!


This bug has been waiting for a fix for quite a long time :

http://www.mail-archive.com/wget@sunsite.dk/msg01675.html

I don't know how to make things change as I'm not a programmer myself...


hi luc,

thank you very much for your bug report. which version of wget are you 
using? i have recently merged a couple of patches that fixed a few bugs, 
so you might want to try the current version of wget available from our 
subversion repository:


http://www.gnu.org/software/wget/wgetdev.html#development

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]

2006-05-19 Thread Mauro Tortonesi

Noèl Köthe wrote:

Hello,

a forwarded report from http://bugs.debian.org/366434

could this behaviour be added to the doc/manpage?


i wonder if it makes sense to add generic support for multiple headers 
in wget, for instance by extending the --header option like this:


wget --header=Pragma: xxx --header=dontoverride,Pragma: xxx2 someurl

as an alternative, we could choose to support multiple headers only for 
a few header types, like Pragma. however, i don't really like this 
second choise, as it would require to hardcode the above mentioned 
header names in the wget sources, which IMVHO is a *VERY* bad practice.


what do you think?

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: -O switch always overwrites output file

2006-05-19 Thread Mauro Tortonesi

Toni Casueps wrote:
I use Wget 1.10 for Linux. If I use -O and there was already a file in 
the current directory with the same name it overwrites it, even if I use 
-nc. Is this a bug or intentional?


IMVHO, this is a bug. if hrvoje does not provide a rationale for this 
behavior, i will fix it before the release of wget 1.11 (which should be 
pretty soon).


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: wrong exit code

2006-05-19 Thread Mauro Tortonesi

Lars Wilke wrote:

Hi,

first this is not a real bug and is more like a wishlist item.

So the problem:

When invoking wget to retrieve a file via ftp all is fine
if the file exists and wget is able to retrieve it. The return
code from wget is 0. If the file is not found on the server the
return code is 1. Good.

I expected that wget would behave the same when using file globbing.
If the file can be found via a pattern and can be downloaded
wget returns with 0. But if the file can not be found after
successfully retrieving a directory listing wget returns with 0, too!
IMHO here wget should exit with the same error code (1) as above.

I searched the docs if this behaviour is mentioned somewhere but
have not found it mentioned. Therefor i am sending this email.
Sorry if i missed this detail mentioned somewehere.


hi lars,

unfortunately one of wget's weak points is its lack of consistency for 
the returned error codes. after the release of wget 1.11, i am planning 
some major architectural change for wget. that will be the best time to 
 redesign the code which handles returned error values.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Missing K/s rate on download of 13MB file

2006-05-19 Thread Mauro Tortonesi

J. Grant wrote:

Hi,

On 14/05/06 21:26, Hrvoje Niksic wrote:


J. Grant [EMAIL PROTECTED] writes:


Could an extra value be added which lists the average rate? average
rate: xx.xx K/s ?


Unfortunately it would have problems fitting on the line.


Perhaps the progress bar would be reduced?


i don't think that would be a good idea.


or the default changed to be the average rate?


i don't think that would be a good idea either. but...

or if neither of those are suitable, could a conf file setting be added 
so we can switch between average rate, and current rate?


...this is an interesting proposal. however, my todo list is already 
*HUGE* and grows larger every day. so i really doubt i will have time to 
implement this feature (at least for the next months). you're very 
welcome to proceed w/ the development of configurable average 
calculation code and send me a patch, though.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: recursive download

2006-05-19 Thread Mauro Tortonesi

Ajar Taknev wrote:

Hi,

I am trying to recursively download from an ftp site without success.
I am behind a squid proxy and I have setup the .wgetrc correctly. When
I do wget -r ftp:/ftp.somesite.com/dir it fetches
ftp.somesite.com/dir/index.html and exits. It doesn't do a recursive
download. When I do the same thing from a machine which is not behind
a proxy the same command does a recursive download. proxy is
squid-2.5.STABLE9 and wget version is  1.10.2. Any ideas what the
problem could be?


recursive FTP retrieval through HTTP proxies has been broken for a long 
time. i have received a patch that should fix the problem some time ago, 
but i haven't been able to test it yet. however, this is one of the 
pending bugs that will be fixed before the upcoming 1.11 release.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: links to follow

2006-05-19 Thread Mauro Tortonesi

Andrea Rimicci wrote:

Hi all,
I'd like retrieve a web document where some links are coded in
javascript calls, so I'd like instruct wget when a something like
JSfunc('my/link/to/follow/') is matched, he recognize
'my/link/to/follow/' as a link to follow.

Is there any way to accomplish this?
Maybe using regexps, to setup which patterns will trigger the link,
will be great.

TIA, Andrea

P.S. dunno if this was already discussed, Ive not found any previous 
post with 'follow' in subject.


hi andrea,

wget does not support parsing of javascript code at the moment, nor 
regexps on downloaded file content. however, we are planning to add 
support for regexps in wget 1.12, and possibly for external url parsers.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: fixed recursive ftp download over proxy and 1.10.3

2006-05-19 Thread Mauro Tortonesi

[EMAIL PROTECTED] wrote:

Hi,


I have been embarrassed with the ftp over http bug . for quite a while : 1.5 years. 


I was very happy to learn that someone had developped a patch.
Happier to read that you would merge it shortly.

Do you know when you will be able to publish this 1.10.3 release ?


1.10.3 will never be released. the next version of wget will be 1.11, 
and i hope i will be able to release it by the end of june.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not supported]

2006-05-19 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 Noèl Köthe wrote:
 Hello,
 a forwarded report from http://bugs.debian.org/366434
 could this behaviour be added to the doc/manpage?

 i wonder if it makes sense to add generic support for multiple headers
 in wget, for instance by extending the --header option like this:

Or by adding a `--append-header' with that functionality.  Originally
--header always appended, but the problem was that people sometimes
wanted to change the headers issued by Wget.

The reason I didn't introduce (in fact keep) append was that HTTP
pretty much disallows duplicate headers.  According to HTTP, a
duplicate header field is equivalent to a single header header with
multiple values joined using the , separator -- which the bug report
mentions.


RE: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not suppor ted]

2006-05-19 Thread Herold Heiko
 From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
 i wonder if it makes sense to add generic support for 
 multiple headers 
 in wget, for instance by extending the --header option like this:
 
 wget --header=Pragma: xxx --header=dontoverride,Pragma: 
 xxx2 someurl

That could be a problem if you need to send a really weird custom header
named dontoverride,Pragma. Probability is near nil but with the whole big
bad internet waiting maybe separating switches (--header and --header-add)
would be better.

 as an alternative, we could choose to support multiple 
 headers only for 
 a few header types, like Pragma. however, i don't really like this 
 second choise, as it would require to hardcode the above mentioned 
 header names in the wget sources, which IMVHO is a *VERY* bad 
 practice.

Same opinion, hard coding the header list would be ugly and will byte some
user in the nose some time in the future: if you need to add several XXXY
headers either patch and recompile or use at least versione x.y

Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 / +39-041-5917073 ph
-- +39-041-5907472 / +39-041-5917472 fax


Re: [Fwd: Bug#366434: wget: Multiple 'Pragma:' headers not suppor ted]

2006-05-19 Thread Mauro Tortonesi

Herold Heiko wrote:

From: Mauro Tortonesi [mailto:[EMAIL PROTECTED]
i wonder if it makes sense to add generic support for 
multiple headers 
in wget, for instance by extending the --header option like this:


wget --header=Pragma: xxx --header=dontoverride,Pragma: 
xxx2 someurl



That could be a problem if you need to send a really weird custom header
named dontoverride,Pragma. Probability is near nil but with the whole big
bad internet waiting maybe separating switches (--header and --header-add)
would be better.


you're right. in fact, i like hrvoje's --append-header proposal better.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: -O switch always overwrites output file

2006-05-19 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 Toni Casueps wrote:
 I use Wget 1.10 for Linux. If I use -O and there was already a file
 in the current directory with the same name it overwrites it, even
 if I use -nc. Is this a bug or intentional?

 IMVHO, this is a bug. if hrvoje does not provide a rationale for
 this behavior, i will fix it before the release of wget 1.11 (which
 should be pretty soon).

Overwriting the file without -nc (as opposed to using file.1 and so
on) is intentional -- if you specify the file name, that file name
gets used.

As for -nc, its work is based on the URL.  There can be more than one
URL, and it seems useful to only download the stuff that is actually
needed.  If you really need to download some URL iff the specified
output file doesn't exist, you can always do this:

test -f file || wget -O file ...


Re: links to follow

2006-05-19 Thread Andrea Rimicci

At 2006-05-19 16:18, Mauro Tortonesi wrote:
Andrea Rimicci wrote:
 Hi all,
 I'd like retrieve a web document where some links are coded in
 javascript calls, so I'd like instruct wget when a something like
 JSfunc('my/link/to/follow/') is matched, he recognize
 'my/link/to/follow/' as a link to follow.

 Is there any way to accomplish this?
 Maybe using regexps, to setup which patterns will trigger the link,
 will be great.

 TIA, Andrea

 P.S. dunno if this was already discussed, Ive not found any previous
 post with 'follow' in subject.

hi andrea,

wget does not support parsing of javascript code at the moment, nor
regexps on downloaded file content. however, we are planning to add
support for regexps in wget 1.12, and possibly for external url parsers.

Thx for reply,
If any specs about the issue are still ongoing discussion, Id like suggest 
to use something like the sed -f or -e switches, to make me able to write 
an external text file, handled by wget, with sed-like syntax, so I can 
reach my goal with some lines like:

f/JSfunc('\(.*\)'/\1/g
f/AnotherFunc('\(.*\)'/\1/g

where f means follow, and \1 is the link that will be handled (put in d/l 
queue, convert to local path, and so on) by wget.


Then I can call wget -switch to define file.txt -other switches etc. to 
get the job accomplished.


Hope the example is clear enough to show this idea.

Thanks again, Andrea




Re: recursive download

2006-05-19 Thread Steven M. Schweda
From: Mauro Tortonesi

 [...] this is one of the pending
 bugs that will be fixed before the upcoming 1.11 release.

   At the risk of beating a dead horse yet again, is there any chance of
getting the VMS changes into this upcoming 1.11 release?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547