Dear Max Bowsher, Thank you for the answer. Yes, it works for some sites. I encountered a new site again that I could not get the contents in the links.
The site is http://www.nt.ntnu.no/~skoge/book/ and I tried Wget -r -np -e robots=off http://www.nt.ntnu.no/~skoge/book/ and Wget -r -np http://www.nt.ntnu.no/~skoge/book/ Both of them failed. Read the robots.txt there was a comment line as written as User-agent: * # directed to all spiders, not just Scooter Disallow: /RCS Disallow: /cards Disallow: /doc Disallow: /fag Disallow: /fakultet Disallow: /foot.shtml Disallow: /head.html Disallow: /index.shtml Disallow: /indexe.shtml Disallow: /info Disallow: /inst Disallow: /ntnubilder Disallow: /robots.txt Disallow: /usage Disallow: /userlist.shtml Disallow: /users Could you help me solve the problem? Thank you in advance. Mo Yun Max Bowsher wrote: > Yun MO wrote: > >>Dear Ma'am/sir, >> >>I could not get all files with "wget -r" command for following >>address. Would you help me? >>Thank you in advance. >> >>M.Y. >>----------------------- >> >><meta NAME="robots" CONTENT="noindex,nofollow"> > > > Wget is obeying the robots instruction. > > wget -e robots=off ... > > will override. > > Max. > -- Yun Mo, Ph.D. Technology Development Center, Tokyo Electron Ltd. 650 Mitsuzawa, Hosaka-cho, Nirasaki-shi, Yamanashi 407-0192, Japan Phone: +81-551-23-4303 Fax: +81-551-23-4454 E-mai: [EMAIL PROTECTED] / [EMAIL PROTECTED]
