Can wget be asked not to retrieve *anything* - not even .html pages - from a given directory and its subdirectories?

This is relevant in situations where one wants to mirror a site with many links to a restricted part of the site which requires authorization but is otherwise of no interest. With wget-1.9.1 my log file contains many "Authorization failure" messages.

For example:

    wget -nv -w1 -kpE -m -X "/restricted" http://www.example.com/

will still attempt to download URLs like http://www.example.com/restricted/index.html and http://www.example.com/restricted/subdir/rubbish.html

Looking though the source of wget-1.10, it looks as though wget gets .html pages even if they are in the exclude-directories list, so presumably wget-1.10 will behave the same way.

I am not sure if this is related, but something similar is logged in
Bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=124867

Can anyone confirm the behaviour I have seen, or suggest a work-around?

Many thanks in advance,
Johann

Reply via email to