I was going to suggest the same approach.  Seems simple enough and would force 
the person to edit the config.  What is entered in place of EDITME is another 
story, but maybe some code can enforce some rules on that, too.

Otis

----- Original Message ----
From: Teruhiko Kurosaka <[EMAIL PROTECTED]>
To: nutch-dev@lucene.apache.org
Sent: Friday, June 16, 2006 2:05:41 PM
Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

How about introducing these changes in an effort to force the nutch
admins
to properly edit the bot identity strings?
1. Add the http.agent.* entries to nutch-site.xml with the value being
"EDITME".
    The description should clearly state that these values *must* be
edited
    to reflect the true identity of the site.
2. Add a piece of code to the HTTP crawler that checks the
configuration.
    If any of the http.agent.* entries are EDITME, the code would log
the error and exit.

-kuro
p.s. I'm subscribing to the digest version of the ML.  If the same or
better idea
has been raised already, please ignore this.







_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to