Thanks Sebastian , I will comment out the LinkDb.invert() statement to stop
the this map-reduce phase from executing , I will keep you posted of the
implications it might have.

Thanks.


On Wed, Mar 19, 2014 at 4:55 PM, Sebastian Nagel <[email protected]
> wrote:

> Hi,
>
> you remove it the same way, no matter whether the crawl is run locally
> or in a cluster: you have to remove the command invertlinks
> (or LinkDb.invert(...) when called from Java). Consequently,
> there will be no linkdb and you cannot use it when indexing.
>
> The concrete steps depend on how the crawler is launched:
>  - bin/crawl
>  - custom script
>  - o.a.n.crawl.Crawler (deprecated, removed in 1.8)
>  - custom Java code
>
> Sebastian
>
> On 03/17/2014 08:51 PM, S.L wrote:
> > Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop,
> > how would I disable it in this case ?
> >
> >
> > On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel <
> [email protected]
> >> wrote:
> >
> >> Hi,
> >>
> >> in the script bin/crawl (or a copy of it):
> >> - comment/remove the line
> >>   $bin/nutch invertlinks $CRAWL_PATH/linkdb
> $CRAWL_PATH/segments/$SEGMENT
> >> - remove
> >>    -linkdb $CRAWL_PATH/linkdb
> >>   from line
> >>    $bin/nutch index ...
> >>
> >> Sebastian
> >>
> >> On 03/17/2014 03:43 PM, S.L wrote:
> >>> Hi ,
> >>>
> >>> I am building a search engine for Chinese medicine and I know the list
> of
> >>> websites that I need to crawl , which we can think of as isolated
> islands
> >>> with no inter-connectivity between them, which makes every page in the
> >>> websites of my interest equally important.
> >>>
> >>> Now Nutch has a MapReduce phase called LinkInversion which calculates
> the
> >>> importance of a given page by  calculating the InLinks for a given
> page ,
> >>> now in my case there are no inter-site inlinks which means I should not
> >>> even attempt to do LinkInversion.
> >>>
> >>> Can some one please suggest how to disable the LinkInversion phase in
> >>> Apache Nutch 1.7 ?
> >>>
> >>> Thanks.
> >>>
> >>
> >>
> >
>
>

Reply via email to