Ya Thanks, That solved my problem. However, while checking for the integrity of the indexes i execute the following command:
bin/nutch org.apache.nutch.searcher.NutchBean apache but its returns me 0 Hits. Can u please tell me what i am missing? Thanks in Advance. Regards, Sachin. > You need to set the following properties in 'conf/nutch-site.xml'. > Though, in the example below, I have left the agent description, agent > url, etc. void but ideally you should set them so that the owner of a > website can find out who is crawling the site and how to reach them. > > <property> > <name>http.agent.name</name> > <value>MySearch</value> > <description>My Search Engine</description> > </property> > > <property> > <name>http.agent.description</name> > <value></value> > <description>Further description of our bot- this text is used in > the User-Agent header. It appears in parenthesis after the agent name. > </description> > </property> > > <property> > <name>http.agent.url</name> > <value></value> > <description>A URL to advertise in the User-Agent header. This will > appear in parenthesis after the agent name. Custom dictates that this > should be a URL of a page explaining the purpose and behavior of this > crawler. > </description> > </property> > > <property> > <name>http.agent.email</name> > <value></value> > <description>An email address to advertise in the HTTP 'From' request > header and User-Agent header. A good practice is to mangle this > address (e.g. 'info at example dot com') to avoid spamming. > </description> > </property> > > Regards, > Susam Pal > http://susam.in/ > > On 8/21/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> > wrote: >> Hi all, >> I am new to Nutch. While trying to create indexes, i am getting >> following >> errors/exceptions: >> . >> . >> . >> fetching http://192.168.36.199/ >> fetch of http://192.168.36.199/ failed with: java.lang.RuntimeException: >> Agent name not configured! >> Fetcher: done >> . >> . >> . >> Indexer: done >> Dedup: starting >> Dedup: adding indexes in: crawl.iiit/indexes >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >> at >> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) >> >> >> Where we have to configure this Agent Name.(I suppose in >> conf/nutch-site.xml but wat to configure). >> >> Thanks in advance. >> >> Regards, >> Sachin. >> >> >> > >
