Thanks Sebastian , I will comment out the LinkDb.invert() statement to stop the this map-reduce phase from executing , I will keep you posted of the implications it might have.
Thanks. On Wed, Mar 19, 2014 at 4:55 PM, Sebastian Nagel <[email protected] > wrote: > Hi, > > you remove it the same way, no matter whether the crawl is run locally > or in a cluster: you have to remove the command invertlinks > (or LinkDb.invert(...) when called from Java). Consequently, > there will be no linkdb and you cannot use it when indexing. > > The concrete steps depend on how the crawler is launched: > - bin/crawl > - custom script > - o.a.n.crawl.Crawler (deprecated, removed in 1.8) > - custom Java code > > Sebastian > > On 03/17/2014 08:51 PM, S.L wrote: > > Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop, > > how would I disable it in this case ? > > > > > > On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel < > [email protected] > >> wrote: > > > >> Hi, > >> > >> in the script bin/crawl (or a copy of it): > >> - comment/remove the line > >> $bin/nutch invertlinks $CRAWL_PATH/linkdb > $CRAWL_PATH/segments/$SEGMENT > >> - remove > >> -linkdb $CRAWL_PATH/linkdb > >> from line > >> $bin/nutch index ... > >> > >> Sebastian > >> > >> On 03/17/2014 03:43 PM, S.L wrote: > >>> Hi , > >>> > >>> I am building a search engine for Chinese medicine and I know the list > of > >>> websites that I need to crawl , which we can think of as isolated > islands > >>> with no inter-connectivity between them, which makes every page in the > >>> websites of my interest equally important. > >>> > >>> Now Nutch has a MapReduce phase called LinkInversion which calculates > the > >>> importance of a given page by calculating the InLinks for a given > page , > >>> now in my case there are no inter-site inlinks which means I should not > >>> even attempt to do LinkInversion. > >>> > >>> Can some one please suggest how to disable the LinkInversion phase in > >>> Apache Nutch 1.7 ? > >>> > >>> Thanks. > >>> > >> > >> > > > >

