Re: Major, and seemingly random problems with wget 1.8.2

2003-10-11 Thread Patrick Robinson
Hello Hrvoje,

On 07-Oct-03, you wrote:

is it possible for someone to e-mail me a working wget setup for amiga to
my private mail?



Thanks


Regards
Patrick Robinson



Re: Major, and seemingly random problems with wget 1.8.2

2003-10-11 Thread Hrvoje Niksic
I don't use an Amiga, nor do I have an idea what you mean by a
working Wget setup.  Have you tried compiling from source?



Re: Major, and seemingly random problems with wget 1.8.2

2003-10-07 Thread Hrvoje Niksic
Josh Brooks [EMAIL PROTECTED] writes:

 I have noticed very unpredictable behavior from wget 1.8.2 -
 specifically I have noticed two things:

 a) sometimes it does not follow all of the links it should

 b) sometimes wget will follow links to other sites and URLs - when the
 command line used should not allow it to do that.

Thanks for the report.  A more detailed response follows below:

 First, sometimes when you attempt to download a site with -k -m
 (--convert-links and --mirror) wget will not follow all of the links and
 will skip some of the files!

 I have no idea why it does this with some sites and doesn't do it with
 other sites.  Here is an example that I have reproduced on several systems
 - all with 1.8.2:

Links are missed on some sites because of the use of incorrect
comments.  This has been fixed for Wget 1.9, where a more relaxed
comment parsing code is the default.  But that's not the case for
www.zorg.org/vsound/.

www.zorg.org/vsound/ contains this markup:

META NAME=ROBOTS  CONTENT=NOFOLLOW

That explicitly tells robots, such as Wget, not to follow the links in
the page.  Wget respects this and does not follow the links.  You can
tell Wget to ignore the robot directives.  For me, this works as
expected:

wget -km -e robots=off http://www.zorg.org/vsound/

You can put `robots=off' in your .wgetrc and this problem will not
bother you again.

 The second problem, and I cannot currently give you an example to try
 yourself but _it does happen_, is if you use this command line:

 wget --tries=inf -nH --no-parent
 --directory-prefix=/usr/data/www.explodingdog.com--random-wait -r -l inf
 --convert-links --html-extension --user-agent=Mozilla/4.0 (compatible;
 MSIE 6.0; AOL 7.0; Windows NT 5.1) www.example.com

 At first it will act normally, just going over the site in question, but
 sometimes, you will come back to the terminal and see if grabbing all
 sorts of pages from totally different sites (!)

The only way I've seen it happen is when it follows a redirection to a
different site.  The redirection is followed because it's considered
to be part of the same download.  However, further links on the
redirected site are not (supposed to be) followed.

If you have a repeatable example, please mail it here so we can
examine it in more detail.


Re: Major, and seemingly random problems with wget 1.8.2

2003-10-07 Thread Josh Brooks

Thank you for the great response.  It is much appreciated - see below...

On Tue, 7 Oct 2003, Hrvoje Niksic wrote:

 www.zorg.org/vsound/ contains this markup:

 META NAME=ROBOTSCONTENT=NOFOLLOW

 That explicitly tells robots, such as Wget, not to follow the links in
 the page.  Wget respects this and does not follow the links.  You can
 tell Wget to ignore the robot directives.  For me, this works as
 expected:

 wget -km -e robots=off http://www.zorg.org/vsound/

Perfect - thank you.


  At first it will act normally, just going over the site in question, but
  sometimes, you will come back to the terminal and see if grabbing all
  sorts of pages from totally different sites (!)

 The only way I've seen it happen is when it follows a redirection to a
 different site.  The redirection is followed because it's considered
 to be part of the same download.  However, further links on the
 redirected site are not (supposed to be) followed.

Ok, is there a way to tell wget not to follow redirects, so it will not
ever do that at all ?  Basically I am looking for a way to tell wget
don't ever get anything with a different FQDN than what I started you
with

thanks.