Do you mean robots.txt? Nutch supports robots.txt and there are no known bugs in its handling of this file. It is entirely possible to change the nutch code to ignore robots.txt - though highly frowned upon. To find out more about robots exclusion please check here http://www.robotstxt.org.

You should probably contact xerox directly if they are annoying you.

Andy

Doug Cutting wrote:

Does anyone on the list know whether Nutch 0.5 can be configured to ignore hosts.txt, or whether there is a bug in hosts.txt handling?

In particular, are the folks behind 13.1.101.37 (jumanji.parc.xerox.com) reading this list? Can they look into this? Their Nutch-based crawl is annoying someone! They should also consider changing the agent name and contact address in their Nutch configuration so that folks contact them directly in the future.

Thanks,

Doug

-------- Original Message --------
Subject: [Nutch-admin] Re: Auto-response for your message to [EMAIL PROTECTED], [EMAIL PROTECTED]
Date: Wed, 29 Sep 2004 13:07:52 -0700 (PDT)
From: John Young <[EMAIL PROTECTED]>


Your message to the Nutch fetcher agent has been received.

The Nutch fetcher obeys the robots exclusion standard, so if you wish
to alter how Nutch accesses your site, please visit
http://www.robotstxt.org/.

For more information about the Nutch project, please visit
http://www.nutch.org/.

Thanks!

Nutch



Wrong answer. Your bot is fetching pages from a subdirectory on our site which is listed in our robots.txt. Other bots do not fetch pages from that directory.

I am trying to help.  If you disregard help for robots.txt
violations, sites will block you.  I am not blocking you, yet.

Perhaps you should reevaluate your auto-responders rule set
to avoid sending out messages like the one above.

Again, from robots.txt:

User-agent: *
Disallow: /games/F
Disallow: /games/O
Disallow: /games/Q
Disallow: /games/special
Disallow: /store/F
Disallow: /store/O
Disallow: /store/Q

A sample of your bot's recent activity:

13.1.101.37 - - [29/Sep/2004:03:30:10 -0700] "GET /store/O/cart.html?ax=refresh&oi=1032542 HTTP/1.0" 302 119 "-"
"NutchCVS/0.05-dev (Nutch; http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"
13.1.101.37 - - [29/Sep/2004:03:30:11 -0700] "GET /store/O/cart.html HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch;
http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"
13.1.101.37 - - [29/Sep/2004:03:30:12 -0700] "GET /store/O/cart.html HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch;
http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"
13.1.101.37 - - [29/Sep/2004:03:30:13 -0700] "GET /store/O/cart.html HTTP/1.0" 302 119 "-" "NutchCVS/0.05-dev (Nutch;
http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"






-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nutch-admin mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-admin




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to