Re: Hadoop hardware specs
Hey Arjit, We use all internal SATA drives in our cluster, which is about 110TB today; if we grow it to our planned 350TB, it will be a healthy mix of worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached vaults, and fibre channel vaults. Brian On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote: Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following specs - * 5 x dual core CPUs * RAM - 4-8GB; ECC preferred, though more expensive * 2 x 250GB SATA drives (on each of the 5 nodes) * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? How can I find out what other projects on hadoop are using? Cheers Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
Re: Hadoop hardware specs
On 11/4/08 2:16 AM, Arijit Mukherjee [EMAIL PROTECTED] wrote: * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? The big reason for the external storage is two fold: A) Provide shared home directory (especially for the HDFS user so that it is easy to use the start scripts that call ssh) B) An off-machine copy of the fsimage and edits file as used by the name node. This way if the name node goes belly up, you'll have an always up-to-date backup to recover. How can I find out what other projects on hadoop are using? Slide 12 of the Apachecon presentation I did earlier this year talks about what Yahoo!'s typical node looks like. For a small 5 node cluster, your hardware specs seem fine to me. An 8GB namenode for 4 data nodes (or maybe even running nn on the same machine as a data node if memory size of jobs is kept in check) should be a-ok, even if you double the storage. You're likely going to run out of disk space before the name node starts swapping.
RE: Hadoop hardware specs
Brian, You seem to have a pretty large cluster. How do you think about the overall performance? Is your implementation on Open-SSH or SSH2? I'm new to this and trying to setup a 20 node cluster. But our Linux boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right now. Anyone has any idea of a workaround? Thanks and best Rgds. Roger Zhang -Original Message- From: Brian Bockelman [mailto:[EMAIL PROTECTED] Sent: 2008年11月4日 21:36 To: core-user@hadoop.apache.org Subject: Re: Hadoop hardware specs Hey Arjit, We use all internal SATA drives in our cluster, which is about 110TB today; if we grow it to our planned 350TB, it will be a healthy mix of worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached vaults, and fibre channel vaults. Brian On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote: Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following specs - * 5 x dual core CPUs * RAM - 4-8GB; ECC preferred, though more expensive * 2 x 250GB SATA drives (on each of the 5 nodes) * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? How can I find out what other projects on hadoop are using? Cheers Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
Hadoop hardware specs
Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following specs - * 5 x dual core CPUs * RAM - 4-8GB; ECC preferred, though more expensive * 2 x 250GB SATA drives (on each of the 5 nodes) * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? How can I find out what other projects on hadoop are using? Cheers Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
RE: Hadoop hardware specs
One correction - the number 5 in the mail below is my estimation of the number of nodes we might need. Can this be too small a cluster? Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com -Original Message- From: Arijit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 04, 2008 3:47 PM To: core-user@hadoop.apache.org Subject: Hadoop hardware specs Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following specs - * 5 x dual core CPUs * RAM - 4-8GB; ECC preferred, though more expensive * 2 x 250GB SATA drives (on each of the 5 nodes) * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? How can I find out what other projects on hadoop are using? Cheers Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.175 / Virus Database: 270.8.6/1765 - Release Date: 11/3/2008 4:59 PM
Re: Hadoop hardware specs
Hey Roger, SSH is only needed to start and stop daemons - it's not really needed for running Hadoop itself. Currently, we do this through custom site mechanisms, and not through SSH. Brian On Nov 4, 2008, at 10:36 AM, Zhang, Roger wrote: Brian, You seem to have a pretty large cluster. How do you think about the overall performance? Is your implementation on Open-SSH or SSH2? I'm new to this and trying to setup a 20 node cluster. But our Linux boxes enforced F-secure SSH2 already, which I found HDFS 0.18 does not support right now. Anyone has any idea of a workaround? Thanks and best Rgds. Roger Zhang -Original Message- From: Brian Bockelman [mailto:[EMAIL PROTECTED] Sent: 2008年11月4日 21:36 To: core-user@hadoop.apache.org Subject: Re: Hadoop hardware specs Hey Arjit, We use all internal SATA drives in our cluster, which is about 110TB today; if we grow it to our planned 350TB, it will be a healthy mix of worker nodes w/ SATA, large internal chases (12 - 48TB), SCSI attached vaults, and fibre channel vaults. Brian On Nov 4, 2008, at 4:16 AM, Arijit Mukherjee wrote: Hi All We're thinking of setting up a Hadoop cluster which will be used to create a prototype system for analyzing telecom data. The wiki page on machine scaling (http://wiki.apache.org/hadoop/MachineScaling) gives an overview of the node specs and from the Hadoop primer I found the following specs - * 5 x dual core CPUs * RAM - 4-8GB; ECC preferred, though more expensive * 2 x 250GB SATA drives (on each of the 5 nodes) * 1-5 TB external storage I'm curious to find out what sort of specs do people use normally. Is the external storage essential or will the individual disks on each node be sufficient? Why would you need an external storage in a hadoop cluster? How can I find out what other projects on hadoop are using? Cheers Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com