Micah Cowan wrote: > The manpage doesn't need to give as detailed explanations as the info > manual (though, as it's auto-generated from the info manual, this could > be hard to avoid); but it should fully describe essential features.
I can't see any good reason for one set of documentation to be different than another. Let the user choose whatever is comfortable. Some users may not even know they have a choice between man and info. > While we're on the subject: should we explicitly warn about using such > features as robots=off, and --user-agent? And what should those warnings > be? Something like, "Use of this feature may help you download files > from which wget would otherwise be blocked, but it's kind of sneaky, and > web site administrators may get upset and block your IP address if they > discover you using it"? No, I don't think we should nor do I think use of those features is "sneaky". With regard to robots.txt, people use it when they don't want *automated* spiders crawling through their sites. A well-crafted wget command that downloads selected information from a site without regard to the robots.txt restrictions is a very different situation. It's true that someone could --mirror the site while ignoring robots.txt, but even that is legitimate in many cases. With regard to user agent, many websites customize their output based on the browser that is displaying the page. If one does not set user agent to match their browser, the retrieved content may be very different than what was displayed in the browser. All that being said, it wouldn't hurt to have a section in the documentation on wget etiquette: think carefully about ignoring robots.txt, use --wait to throttle the download if it will be lengthy, etc. Perhaps we can even add a --be-nice option similar to --mirror that adjusts options to match the etiquette suggestions. Tony
