I am seeing some anomalous behavior with wget with respects to mirroring
(-m) a site and trying to keep that mirror local to the source domain.
There are a couple of CGI scripts that inevitably get called that end up
issuing redirects off-site.  These redirects are followed even though
--span-hosts is not supplied, and even if the destination domains are
added via the --exclude-domains option.

A test case is up at http://fastolfe.net/misc/wget-bug/.
Spidering http://fastolfe.net/misc/wget-bug/normal will
correctly ignore the *link* to www.example.com, but spidering
http://fastolfe.net/misc/wget-bug/redirected ends up following a local
link that results in a redirection.  This redirection is followed
unconditionally.

In this case, www.example.com doesn't exist, but if this were a normal
domain, wget would still fetch the page and store it locally (creating
a www.example.com directory, etc.).

I am using GNU Wget 1.7 installed via RPM as wget-1.7-3mdk on Linux
2.4.12 i686.

Thanks!

-- 
 == David Nesting WL7RO Fastolfe [EMAIL PROTECTED] http://fastolfe.net/ ==
 fastolfe.net/me/pgp-key A054 47B1 6D4C E97A D882  C41F 3065 57D9 832F AB01

Reply via email to