hi i use nutch 0.7.1 to crawl a few intranetserver. yesterday i tried to exclude some directories with the robots.txt. but nothing changed. i copied this robots.txt to the server:
User-agent: NutchCVS Disallow: /cgi-bin/ Disallow: /manuals/ the User-agent "NutchCVS" and the robots agent name in nutch-default is the same. can anyone helps me with this problem? i'm crawling with this command: bin/nutch crawl urls -dir crawl060621 -depth 15 &> crawl060621.log & greets david ========================================================== David Wojciechowski Universitätsklinikum Freiburg Klinikrechenzentrum Agnesenstrasse 6-8 D-79106 Freiburg Telefon : 0761 / 270 - 1842 Fax: 0761 / 270 - 2276 E-Mail : [EMAIL PROTECTED] ========================================================== Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
