You need to set the following properties in 'conf/nutch-site.xml'. Though, in the example below, I have left the agent description, agent url, etc. void but ideally you should set them so that the owner of a website can find out who is crawling the site and how to reach them.
<property> <name>http.agent.name</name> <value>MySearch</value> <description>My Search Engine</description> </property> <property> <name>http.agent.description</name> <value></value> <description>Further description of our bot- this text is used in the User-Agent header. It appears in parenthesis after the agent name. </description> </property> <property> <name>http.agent.url</name> <value></value> <description>A URL to advertise in the User-Agent header. This will appear in parenthesis after the agent name. Custom dictates that this should be a URL of a page explaining the purpose and behavior of this crawler. </description> </property> <property> <name>http.agent.email</name> <value></value> <description>An email address to advertise in the HTTP 'From' request header and User-Agent header. A good practice is to mangle this address (e.g. 'info at example dot com') to avoid spamming. </description> </property> Regards, Susam Pal http://susam.in/ On 8/21/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi all, > I am new to Nutch. While trying to create indexes, i am getting following > errors/exceptions: > . > . > . > fetching http://192.168.36.199/ > fetch of http://192.168.36.199/ failed with: java.lang.RuntimeException: > Agent name not configured! > Fetcher: done > . > . > . > Indexer: done > Dedup: starting > Dedup: adding indexes in: crawl.iiit/indexes > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at > org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) > > > Where we have to configure this Agent Name.(I suppose in > conf/nutch-site.xml but wat to configure). > > Thanks in advance. > > Regards, > Sachin. > > >
