Anyone , is this possible ? Thanks

On Mon, Mar 17, 2014 at 3:51 PM, S.L <[email protected]> wrote:

> Thanks Sebastian! I am actually running it as a MapReduce Job on Hadoop,
> how would I disable it in this case ?
>
>
> On Mon, Mar 17, 2014 at 3:39 PM, Sebastian Nagel <
> [email protected]> wrote:
>
>> Hi,
>>
>> in the script bin/crawl (or a copy of it):
>> - comment/remove the line
>>   $bin/nutch invertlinks $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT
>> - remove
>>    -linkdb $CRAWL_PATH/linkdb
>>   from line
>>    $bin/nutch index ...
>>
>> Sebastian
>>
>> On 03/17/2014 03:43 PM, S.L wrote:
>> > Hi ,
>> >
>> > I am building a search engine for Chinese medicine and I know the list
>> of
>> > websites that I need to crawl , which we can think of as isolated
>> islands
>> > with no inter-connectivity between them, which makes every page in the
>> > websites of my interest equally important.
>> >
>> > Now Nutch has a MapReduce phase called LinkInversion which calculates
>> the
>> > importance of a given page by  calculating the InLinks for a given page
>> ,
>> > now in my case there are no inter-site inlinks which means I should not
>> > even attempt to do LinkInversion.
>> >
>> > Can some one please suggest how to disable the LinkInversion phase in
>> > Apache Nutch 1.7 ?
>> >
>> > Thanks.
>> >
>>
>>
>

Reply via email to