I was going to suggest the same approach. Seems simple enough and would force the person to edit the config. What is entered in place of EDITME is another story, but maybe some code can enforce some rules on that, too.
Otis ----- Original Message ---- From: Teruhiko Kurosaka <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org Sent: Friday, June 16, 2006 2:05:41 PM Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? How about introducing these changes in an effort to force the nutch admins to properly edit the bot identity strings? 1. Add the http.agent.* entries to nutch-site.xml with the value being "EDITME". The description should clearly state that these values *must* be edited to reflect the true identity of the site. 2. Add a piece of code to the HTTP crawler that checks the configuration. If any of the http.agent.* entries are EDITME, the code would log the error and exit. -kuro p.s. I'm subscribing to the digest version of the ML. If the same or better idea has been raised already, please ignore this. _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers