Nutch Hadoop question

Eran Zinman Wed, 11 Nov 2009 02:19:47 -0800

Hi All,

I'm using Nutch with Hadoop with great pleasure - working great and really
increase crawling performance on multiple machines.


I have two strong machines and two older machines which I would like to use.

So far I've been using only the two strong machines with Hadoop.

Now I would like to add the two less powerful machines to do some processing
as well.

My question is - Right now the HDFS is shared between the two powerful
computers. I don't want the two other computer to store any content on them
as they have a slow and unreliable harddisk. I just want the two other
machines to do processing (i.e. mapreduce) and not store any content on
them.

Is that possible - or do I have to use HDFS on all machines that do
processing?

If it's possible to use a machine only for mapreduce - how this is done?

Thank you for your help,
Eran

Nutch Hadoop question

Reply via email to