Re: Nutch Hadoop question

2009-11-13 Thread Eran Zinman
Hi All,

Don't want to bother you guys too much... I've tried searching for this
topic and do some testing myself but so far was quite unsuccessful.

Basically - I wish to use some computers only for map-reduce processing and
not for HDFS, does anyone know how this can be done?

Thanks,
Eran

On Wed, Nov 11, 2009 at 12:19 PM, Eran Zinman zze...@gmail.com wrote:

 Hi All,

 I'm using Nutch with Hadoop with great pleasure - working great and really
 increase crawling performance on multiple machines.

 I have two strong machines and two older machines which I would like to
 use.

 So far I've been using only the two strong machines with Hadoop.

 Now I would like to add the two less powerful machines to do some
 processing as well.

 My question is - Right now the HDFS is shared between the two powerful
 computers. I don't want the two other computer to store any content on them
 as they have a slow and unreliable harddisk. I just want the two other
 machines to do processing (i.e. mapreduce) and not store any content on
 them.

 Is that possible - or do I have to use HDFS on all machines that do
 processing?

 If it's possible to use a machine only for mapreduce - how this is done?

 Thank you for your help,
 Eran



Re: Nutch Hadoop question

2009-11-13 Thread TuxRacer69

Hi Eran,

mapreduce has to store its data on HDFS file system.
But if you want to separate the two groups of servers, you could build 
two separate HDFS filesystems. To separate the two setups, you will need 
to make sure there is no cross communication between the two parts,


cheer
Alex

Eran Zinman wrote:

Hi All,

Don't want to bother you guys too much... I've tried searching for this
topic and do some testing myself but so far was quite unsuccessful.

Basically - I wish to use some computers only for map-reduce processing and
not for HDFS, does anyone know how this can be done?

Thanks,
Eran

On Wed, Nov 11, 2009 at 12:19 PM, Eran Zinman zze...@gmail.com wrote:

  

Hi All,

I'm using Nutch with Hadoop with great pleasure - working great and really
increase crawling performance on multiple machines.

I have two strong machines and two older machines which I would like to
use.

So far I've been using only the two strong machines with Hadoop.

Now I would like to add the two less powerful machines to do some
processing as well.

My question is - Right now the HDFS is shared between the two powerful
computers. I don't want the two other computer to store any content on them
as they have a slow and unreliable harddisk. I just want the two other
machines to do processing (i.e. mapreduce) and not store any content on
them.

Is that possible - or do I have to use HDFS on all machines that do
processing?

If it's possible to use a machine only for mapreduce - how this is done?

Thank you for your help,
Eran




  




Re: Nutch Hadoop question

2009-11-13 Thread Andrzej Bialecki

TuxRacer69 wrote:

Hi Eran,

mapreduce has to store its data on HDFS file system.


More specifically, it needs read/write access to a shared filesystem. If 
you are brave enough you can use NFS, too, or any other type of 
filesystem that can be mounted locally on each node (e.g. a NetApp).


But if you want to separate the two groups of servers, you could build 
two separate HDFS filesystems. To separate the two setups, you will need 
to make sure there is no cross communication between the two parts,


You can run two separate clusters even on the same set of machines, just 
 configure them to use different ports AND different local paths.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Nutch Hadoop question

2009-11-13 Thread Eran Zinman
Thanks for the help guys.

On Fri, Nov 13, 2009 at 5:20 PM, Andrzej Bialecki a...@getopt.org wrote:

 TuxRacer69 wrote:

 Hi Eran,

 mapreduce has to store its data on HDFS file system.


 More specifically, it needs read/write access to a shared filesystem. If
 you are brave enough you can use NFS, too, or any other type of filesystem
 that can be mounted locally on each node (e.g. a NetApp).


 But if you want to separate the two groups of servers, you could build two
 separate HDFS filesystems. To separate the two setups, you will need to make
 sure there is no cross communication between the two parts,


 You can run two separate clusters even on the same set of machines, just
  configure them to use different ports AND different local paths.


 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com