You need to set the following properties in 'conf/nutch-site.xml'.
Though, in the example below, I have left the agent description, agent
url, etc. void but ideally you should set them so that the owner of a
website can find out who is crawling the site and how to reach them.

<property>
  <name>http.agent.name</name>
  <value>MySearch</value>
  <description>My Search Engine</description>
</property>

<property>
  <name>http.agent.description</name>
  <value></value>
  <description>Further description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  </description>
</property>

<property>
  <name>http.agent.url</name>
  <value></value>
  <description>A URL to advertise in the User-Agent header.  This will
   appear in parenthesis after the agent name. Custom dictates that this
   should be a URL of a page explaining the purpose and behavior of this
   crawler.
  </description>
</property>

<property>
  <name>http.agent.email</name>
  <value></value>
  <description>An email address to advertise in the HTTP 'From' request
   header and User-Agent header. A good practice is to mangle this
   address (e.g. 'info at example dot com') to avoid spamming.
  </description>
</property>

Regards,
Susam Pal
http://susam.in/

On 8/21/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi all,
> I am new to Nutch. While trying to create indexes, i am getting following
> errors/exceptions:
> .
> .
> .
> fetching http://192.168.36.199/
> fetch of http://192.168.36.199/ failed with: java.lang.RuntimeException:
> Agent name not configured!
> Fetcher: done
> .
> .
> .
> Indexer: done
> Dedup: starting
> Dedup: adding indexes in: crawl.iiit/indexes
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at
> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
>
> Where we have to configure this Agent Name.(I suppose in
> conf/nutch-site.xml but wat to configure).
>
> Thanks in advance.
>
> Regards,
> Sachin.
>
>
>

Reply via email to