It may be useful to add a paragraph to the manual which lets users know they can use the --debug option to see why certain URLs are not followed (rejected) by wget. It would be especially useful to mention this in "9.1 Robot Exclusion". Something like this:

If you wish to see which URLs are blocked by the robots.txt while wget is crawling, use the --debug option. You will see 2 lines that describe why the URL is being rejected:

Rejecting path /abc/bar.html because of rule `/abc'.
Not following http://foo.org/abc/bar.html because robots.txt forbids it.

Thanks,
Frank

Reply via email to