Anyone , is this possible ? Thanks
On Mon, Mar 17, 2014 at 3:51 PM, S.L <[email protected]> wrote: > Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop, > how would I disable it in this case ? > > > On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel < > [email protected]> wrote: > >> Hi, >> >> in the script bin/crawl (or a copy of it): >> - comment/remove the line >> $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT >> - remove >> -linkdb $CRAWL_PATH/linkdb >> from line >> $bin/nutch index ... >> >> Sebastian >> >> On 03/17/2014 03:43 PM, S.L wrote: >> > Hi , >> > >> > I am building a search engine for Chinese medicine and I know the list >> of >> > websites that I need to crawl , which we can think of as isolated >> islands >> > with no inter-connectivity between them, which makes every page in the >> > websites of my interest equally important. >> > >> > Now Nutch has a MapReduce phase called LinkInversion which calculates >> the >> > importance of a given page by calculating the InLinks for a given page >> , >> > now in my case there are no inter-site inlinks which means I should not >> > even attempt to do LinkInversion. >> > >> > Can some one please suggest how to disable the LinkInversion phase in >> > Apache Nutch 1.7 ? >> > >> > Thanks. >> > >> >> >

