Dear  Max Bowsher,

Thank you for the answer. Yes, it  works for some sites. I encountered a
 new site again that I could not get the contents in the links.

The site is
   http://www.nt.ntnu.no/~skoge/book/
and I tried
   Wget -r -np -e robots=off http://www.nt.ntnu.no/~skoge/book/
and
   Wget -r -np  http://www.nt.ntnu.no/~skoge/book/

Both of them failed. Read the robots.txt there was a comment line  as
written as

User-agent: *           # directed to all spiders, not just Scooter
Disallow: /RCS
Disallow: /cards
Disallow: /doc
Disallow: /fag
Disallow: /fakultet
Disallow: /foot.shtml
Disallow: /head.html
Disallow: /index.shtml
Disallow: /indexe.shtml
Disallow: /info
Disallow: /inst
Disallow: /ntnubilder
Disallow: /robots.txt
Disallow: /usage
Disallow: /userlist.shtml
Disallow: /users

Could you help me solve the problem?

Thank you in advance.

Mo  Yun


Max Bowsher wrote:
> Yun MO wrote:
> 
>>Dear Ma'am/sir,
>>
>>I could not get all files with "wget -r" command for following
>>address. Would you help me?
>>Thank you in advance.
>>
>>M.Y.
>>-----------------------
>>
>><meta NAME="robots" CONTENT="noindex,nofollow">
> 
> 
> Wget is obeying the robots instruction.
> 
> wget -e robots=off ...
> 
> will override.
> 
> Max.
> 


-- 
Yun Mo, Ph.D.

Technology Development Center, Tokyo Electron Ltd.
650 Mitsuzawa, Hosaka-cho, Nirasaki-shi, Yamanashi 407-0192, Japan

Phone: +81-551-23-4303   Fax: +81-551-23-4454
E-mai: [EMAIL PROTECTED] / [EMAIL PROTECTED]

Reply via email to