Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB  
today; if we grow it to our planned 350TB, it will be a healthy mix of  
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached  
vaults, and fibre channel vaults.


Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives  
an

overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs
* RAM - 4-8GB; ECC preferred, though more expensive
* 2 x 250GB SATA drives (on each of the 5 nodes)
* 1-5 TB external storage

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each  
node

be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers
Arijit


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com





Re: Hadoop hardware specs

2008-11-04 Thread Allen Wittenauer



On 11/4/08 2:16 AM, Arijit Mukherjee [EMAIL PROTECTED]
wrote:

 * 1-5 TB external storage
 
 I'm curious to find out what sort of specs do people use normally. Is
 the external storage essential or will the individual disks on each node
 be sufficient? Why would you need an external storage in a hadoop
 cluster? 

The big reason for the external storage is two fold:

A) Provide shared home directory (especially for the HDFS user so that it is
easy to use the start scripts that call ssh)

B) An off-machine copy of the fsimage and edits file as used by the name
node.  This way if the name node goes belly up, you'll have an always
up-to-date backup to recover.

 How can I find out what other projects on hadoop are using?

Slide 12 of the Apachecon presentation I did earlier this year talks
about what Yahoo!'s typical node looks like.  For a small 5 node cluster,
your hardware specs seem fine to me.

An 8GB namenode for 4 data nodes (or maybe even running nn on the same
machine as a data node if memory size of jobs is kept in check) should be
a-ok, even if you double the storage.  You're likely going to run out of
disk space before the name node starts swapping.



RE: Hadoop hardware specs

2008-11-04 Thread Zhang, Roger
Brian,

You seem to have a pretty large cluster. How do you think about the overall 
performance?
Is your implementation on Open-SSH or SSH2?

I'm new to this and trying to setup a 20 node cluster. But our Linux boxes 
enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right 
now. 

Anyone has any idea of a workaround?


Thanks and best Rgds. 
Roger Zhang 

-Original Message-
From: Brian Bockelman [mailto:[EMAIL PROTECTED] 
Sent: 2008年11月4日 21:36
To: core-user@hadoop.apache.org
Subject: Re: Hadoop hardware specs

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB  
today; if we grow it to our planned 350TB, it will be a healthy mix of  
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached  
vaults, and fibre channel vaults.

Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:

 Hi All

 We're thinking of setting up a Hadoop cluster which will be used to
 create a prototype system for analyzing telecom data. The wiki page on
 machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives  
 an
 overview of the node specs and from the Hadoop primer I found the
 following specs -

 * 5 x dual core CPUs
 * RAM - 4-8GB; ECC preferred, though more expensive
 * 2 x 250GB SATA drives (on each of the 5 nodes)
 * 1-5 TB external storage

 I'm curious to find out what sort of specs do people use normally. Is
 the external storage essential or will the individual disks on each  
 node
 be sufficient? Why would you need an external storage in a hadoop
 cluster? How can I find out what other projects on hadoop are using?
 Cheers
 Arijit


 Dr. Arijit Mukherjee
 Principal Member of Technical Staff, Level-II
 Connectiva Systems (I) Pvt. Ltd.
 J-2, Block GP, Sector V, Salt Lake
 Kolkata 700 091, India
 Phone: +91 (0)33 23577531/32 x 107
 http://www.connectivasystems.com




Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs 
* RAM - 4-8GB; ECC preferred, though more expensive 
* 2 x 250GB SATA drives (on each of the 5 nodes) 
* 1-5 TB external storage 

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers 
Arijit 


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com




RE: Hadoop hardware specs

2008-11-04 Thread Arijit Mukherjee
One correction - the number 5 in the mail below is my estimation of the
number of nodes we might need. Can this be too small a cluster?

Arijit

Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com


-Original Message-
From: Arijit Mukherjee [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 04, 2008 3:47 PM
To: core-user@hadoop.apache.org
Subject: Hadoop hardware specs


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page on
machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs 
* RAM - 4-8GB; ECC preferred, though more expensive 
* 2 x 250GB SATA drives (on each of the 5 nodes) 
* 1-5 TB external storage 

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers 
Arijit 


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com No
virus found in this incoming message. Checked by AVG -
http://www.avg.com 
Version: 8.0.175 / Virus Database: 270.8.6/1765 - Release Date:
11/3/2008 4:59 PM




Re: Hadoop hardware specs

2008-11-04 Thread Brian Bockelman

Hey Roger,

SSH is only needed to start and stop daemons - it's not really needed  
for running Hadoop itself.  Currently, we do this through custom site  
mechanisms, and not through SSH.


Brian

On Nov 4, 2008, at 10:36 AM, Zhang, Roger wrote:


Brian,

You seem to have a pretty large cluster. How do you think about the  
overall performance?

Is your implementation on Open-SSH or SSH2?

I'm new to this and trying to setup a 20 node cluster. But our Linux  
boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does  
not support right now.


Anyone has any idea of a workaround?


Thanks and best Rgds.
   Roger Zhang

-Original Message-
From: Brian Bockelman [mailto:[EMAIL PROTECTED]
Sent: 2008年11月4日 21:36
To: core-user@hadoop.apache.org
Subject: Re: Hadoop hardware specs

Hey Arjit,

We use all internal SATA drives in our cluster, which is about 110TB
today; if we grow it to our planned 350TB, it will be a healthy mix of
worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached
vaults, and fibre channel vaults.

Brian

On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote:


Hi All

We're thinking of setting up a Hadoop cluster which will be used to
create a prototype system for analyzing telecom data. The wiki page  
on

machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives
an
overview of the node specs and from the Hadoop primer I found the
following specs -

* 5 x dual core CPUs
* RAM - 4-8GB; ECC preferred, though more expensive
* 2 x 250GB SATA drives (on each of the 5 nodes)
* 1-5 TB external storage

I'm curious to find out what sort of specs do people use normally. Is
the external storage essential or will the individual disks on each
node
be sufficient? Why would you need an external storage in a hadoop
cluster? How can I find out what other projects on hadoop are using?
Cheers
Arijit


Dr. Arijit Mukherjee
Principal Member of Technical Staff, Level-II
Connectiva Systems (I) Pvt. Ltd.
J-2, Block GP, Sector V, Salt Lake
Kolkata 700 091, India
Phone: +91 (0)33 23577531/32 x 107
http://www.connectivasystems.com