Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop,
how would I disable it in this case ?


On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel <[email protected]
> wrote:

> Hi,
>
> in the script bin/crawl (or a copy of it):
> - comment/remove the line
>   $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT
> - remove
>    -linkdb $CRAWL_PATH/linkdb
>   from line
>    $bin/nutch index ...
>
> Sebastian
>
> On 03/17/2014 03:43 PM, S.L wrote:
> > Hi ,
> >
> > I am building a search engine for Chinese medicine and I know the list of
> > websites that I need to crawl , which we can think of as isolated islands
> > with no inter-connectivity between them, which makes every page in the
> > websites of my interest equally important.
> >
> > Now Nutch has a MapReduce phase called LinkInversion which calculates the
> > importance of a given page by  calculating the InLinks for a given page ,
> > now in my case there are no inter-site inlinks which means I should not
> > even attempt to do LinkInversion.
> >
> > Can some one please suggest how to disable the LinkInversion phase in
> > Apache Nutch 1.7 ?
> >
> > Thanks.
> >
>
>

Reply via email to