Hi, in the script bin/crawl (or a copy of it): - comment/remove the line $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT - remove -linkdb $CRAWL_PATH/linkdb from line $bin/nutch index ...
Sebastian On 03/17/2014 03:43 PM, S.L wrote: > Hi , > > I am building a search engine for Chinese medicine and I know the list of > websites that I need to crawl , which we can think of as isolated islands > with no inter-connectivity between them, which makes every page in the > websites of my interest equally important. > > Now Nutch has a MapReduce phase called LinkInversion which calculates the > importance of a given page by calculating the InLinks for a given page , > now in my case there are no inter-site inlinks which means I should not > even attempt to do LinkInversion. > > Can some one please suggest how to disable the LinkInversion phase in > Apache Nutch 1.7 ? > > Thanks. >

