Re: Problem in creating Index

sachin_s Tue, 21 Aug 2007 04:45:02 -0700

Ya Thanks, That solved my problem. However, while checking for the
integrity of the indexes i execute the following command:


bin/nutch org.apache.nutch.searcher.NutchBean apache

but its returns me 0 Hits. Can u please tell me what i am missing?

Thanks in Advance.

Regards,
Sachin.

> You need to set the following properties in 'conf/nutch-site.xml'.
> Though, in the example below, I have left the agent description, agent
> url, etc. void but ideally you should set them so that the owner of a
> website can find out who is crawling the site and how to reach them.
>
> <property>
>   <name>http.agent.name</name>
>   <value>MySearch</value>
>   <description>My Search Engine</description>
> </property>
>
> <property>
>   <name>http.agent.description</name>
>   <value></value>
>   <description>Further description of our bot- this text is used in
>   the User-Agent header.  It appears in parenthesis after the agent name.
>   </description>
> </property>
>
> <property>
>   <name>http.agent.url</name>
>   <value></value>
>   <description>A URL to advertise in the User-Agent header.  This will
>    appear in parenthesis after the agent name. Custom dictates that this
>    should be a URL of a page explaining the purpose and behavior of this
>    crawler.
>   </description>
> </property>
>
> <property>
>   <name>http.agent.email</name>
>   <value></value>
>   <description>An email address to advertise in the HTTP 'From' request
>    header and User-Agent header. A good practice is to mangle this
>    address (e.g. 'info at example dot com') to avoid spamming.
>   </description>
> </property>
>
> Regards,
> Susam Pal
> http://susam.in/
>
> On 8/21/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> wrote:
>> Hi all,
>> I am new to Nutch. While trying to create indexes, i am getting
>> following
>> errors/exceptions:
>> .
>> .
>> .
>> fetching http://192.168.36.199/
>> fetch of http://192.168.36.199/ failed with: java.lang.RuntimeException:
>> Agent name not configured!
>> Fetcher: done
>> .
>> .
>> .
>> Indexer: done
>> Dedup: starting
>> Dedup: adding indexes in: crawl.iiit/indexes
>> Exception in thread "main" java.io.IOException: Job failed!
>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>         at
>> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
>>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>>
>>
>> Where we have to configure this Agent Name.(I suppose in
>> conf/nutch-site.xml but wat to configure).
>>
>> Thanks in advance.
>>
>> Regards,
>> Sachin.
>>
>>
>>
>
>

Re: Problem in creating Index

Reply via email to