Hi,

in the script bin/crawl (or a copy of it):
- comment/remove the line
  $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT
- remove
   -linkdb $CRAWL_PATH/linkdb
  from line
   $bin/nutch index ...

Sebastian

On 03/17/2014 03:43 PM, S.L wrote:
> Hi ,
> 
> I am building a search engine for Chinese medicine and I know the list of
> websites that I need to crawl , which we can think of as isolated islands
> with no inter-connectivity between them, which makes every page in the
> websites of my interest equally important.
> 
> Now Nutch has a MapReduce phase called LinkInversion which calculates the
> importance of a given page by  calculating the InLinks for a given page ,
> now in my case there are no inter-site inlinks which means I should not
> even attempt to do LinkInversion.
> 
> Can some one please suggest how to disable the LinkInversion phase in
> Apache Nutch 1.7 ?
> 
> Thanks.
> 

Reply via email to