Can wget be asked not to retrieve *anything* - not even .html pages -
from a given directory and its subdirectories?
This is relevant in situations where one wants to mirror a site with
many links to a restricted part of the site which requires
authorization but is otherwise of no interest. With wget-1.9.1 my log
file contains many "Authorization failure" messages.
For example:
wget -nv -w1 -kpE -m -X "/restricted" http://www.example.com/
will still attempt to download URLs like
http://www.example.com/restricted/index.html and
http://www.example.com/restricted/subdir/rubbish.html
Looking though the source of wget-1.10, it looks as though wget gets
.html pages even if they are in the exclude-directories list, so
presumably wget-1.10 will behave the same way.
I am not sure if this is related, but something similar is logged in
Bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=124867
Can anyone confirm the behaviour I have seen, or suggest a work-around?
Many thanks in advance,
Johann