Sorry about that, we just changed the agent name and the email reference. The crawl was on some user browser histories keeping the download speed as low as possible. I hope only the anti-bot system was annoyed.
Fabio > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Andy Hedges > Sent: Thursday, September 30, 2004 10:50 AM > To: [EMAIL PROTECTED] > Subject: Re: [Nutch-dev] [Fwd: [Nutch-admin] Re: > Auto-response for your message to [EMAIL PROTECTED], > [EMAIL PROTECTED] > > > Do you mean robots.txt? Nutch supports robots.txt and there > are no known > bugs in its handling of this file. It is entirely possible to > change the > nutch code to ignore robots.txt - though highly frowned upon. To find > out more about robots exclusion please check here > http://www.robotstxt.org. > > You should probably contact xerox directly if they are annoying you. > > Andy > > Doug Cutting wrote: > > > Does anyone on the list know whether Nutch 0.5 can be configured to > > ignore hosts.txt, or whether there is a bug in hosts.txt handling? > > > > In particular, are the folks behind 13.1.101.37 > > (jumanji.parc.xerox.com) reading this list? Can they look > into this? > > Their Nutch-based crawl is annoying someone! They should also > > consider changing the agent name and contact address in their Nutch > > configuration so that folks contact them directly in the future. > > > > Thanks, > > > > Doug > > > > -------- Original Message -------- > > Subject: [Nutch-admin] Re: Auto-response for your message to > > [EMAIL PROTECTED], [EMAIL PROTECTED] > > Date: Wed, 29 Sep 2004 13:07:52 -0700 (PDT) > > From: John Young <[EMAIL PROTECTED]> > > > >> Your message to the Nutch fetcher agent has been received. > >> > >> The Nutch fetcher obeys the robots exclusion standard, so > if you wish > >> to alter how Nutch accesses your site, please visit > >> http://www.robotstxt.org/. > >> > >> For more information about the Nutch project, please visit > >> http://www.nutch.org/. > >> > >> Thanks! > >> > >> Nutch > > > > > > > > Wrong answer. Your bot is fetching pages from a > subdirectory on our > > site which is listed in our robots.txt. Other bots do not > fetch pages > > from that directory. > > > > I am trying to help. If you disregard help for robots.txt > violations, > > sites will block you. I am not blocking you, yet. > > > > Perhaps you should reevaluate your auto-responders rule set > to avoid > > sending out messages like the one above. > > > > Again, from robots.txt: > > > > User-agent: * > > Disallow: /games/F > > Disallow: /games/O > > Disallow: /games/Q > > Disallow: /games/special > > Disallow: /store/F > > Disallow: /store/O > > Disallow: /store/Q > > > > A sample of your bot's recent activity: > > > > 13.1.101.37 - - [29/Sep/2004:03:30:10 -0700] "GET > > /store/O/cart.html?ax=refresh&oi=1032542 HTTP/1.0" 302 119 "-" > > "NutchCVS/0.05-dev (Nutch; http://www.nutch.org/docs/en/bot.html; > > [EMAIL PROTECTED])" > > 13.1.101.37 - - [29/Sep/2004:03:30:11 -0700] "GET > /store/O/cart.html > > HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch; > > http://www.nutch.org/docs/en/bot.html; > > [EMAIL PROTECTED])" > > 13.1.101.37 - - [29/Sep/2004:03:30:12 -0700] "GET > /store/O/cart.html > > HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch; > > http://www.nutch.org/docs/en/bot.html; > > [EMAIL PROTECTED])" > > 13.1.101.37 - - [29/Sep/2004:03:30:13 -0700] "GET > /store/O/cart.html > > HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch; > > http://www.nutch.org/docs/en/bot.html; > > [EMAIL PROTECTED])" > > > > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IT Product Guide on > > ITManagersJournal Use IT products in your business? Tell us > what you > > think of them. Give us Your Opinions, Get Free ThinkGeek Gift > > Certificates! Click to find out more > > http://productguide.itmanagersjournal.com/guidepromo.tmpl > > _______________________________________________ > > Nutch-admin mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/nutch-admin > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IT Product Guide on > > ITManagersJournal Use IT products in your business? Tell us > what you > > think of them. Give us Your Opinions, Get Free ThinkGeek Gift > > Certificates! Click to find out more > > http://productguide.itmanagersjournal.com/guidepromo.tmpl > > _______________________________________________ > > Nutch-developers mailing list [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/nutch-developers > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on > ITManagersJournal Use IT products in your business? Tell us > what you think of them. Give us Your Opinions, Get Free > ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > Nutch-developers mailing list [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
