Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop, how would I disable it in this case ?
On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel <[email protected] > wrote: > Hi, > > in the script bin/crawl (or a copy of it): > - comment/remove the line > $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT > - remove > -linkdb $CRAWL_PATH/linkdb > from line > $bin/nutch index ... > > Sebastian > > On 03/17/2014 03:43 PM, S.L wrote: > > Hi , > > > > I am building a search engine for Chinese medicine and I know the list of > > websites that I need to crawl , which we can think of as isolated islands > > with no inter-connectivity between them, which makes every page in the > > websites of my interest equally important. > > > > Now Nutch has a MapReduce phase called LinkInversion which calculates the > > importance of a given page by calculating the InLinks for a given page , > > now in my case there are no inter-site inlinks which means I should not > > even attempt to do LinkInversion. > > > > Can some one please suggest how to disable the LinkInversion phase in > > Apache Nutch 1.7 ? > > > > Thanks. > > > >

