Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
Hello all,

I am looking forward to build a 5 node hadoop cluster with the following
configurations per machine.  --

1. Intel Xeon E5-2609 (2.40GHz/4-core)
2. 32 GB RAM (8GB 1Rx4 PC3)
3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine)
4. Ethernet 1GbE connection

I would like the experts to please review it and share if this sounds like
a optimal/deserving hadoop hardware configuration or not ?
I know without knowing the actual use case its not worth commenting, but
still in general I would like to have the views. Also please suggest if I
am missing something.

Regards,
Praveenesh


Re: Hadoop cluster hardware configuration

2012-06-04 Thread Nitin Pawar
if you tell us the purpose of this cluster, then it would be helpful to
tell exactly how good it is

On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.comwrote:

 Hello all,

 I am looking forward to build a 5 node hadoop cluster with the following
 configurations per machine.  --

 1. Intel Xeon E5-2609 (2.40GHz/4-core)
 2. 32 GB RAM (8GB 1Rx4 PC3)
 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine)
 4. Ethernet 1GbE connection

 I would like the experts to please review it and share if this sounds like
 a optimal/deserving hadoop hardware configuration or not ?
 I know without knowing the actual use case its not worth commenting, but
 still in general I would like to have the views. Also please suggest if I
 am missing something.

 Regards,
 Praveenesh




-- 
Nitin Pawar


Re: Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
On a very high level... we would be utilizing cluster not only for hadoop
but for other I/O bound or in-memory operations.
That is the reason we are going for SAS hard disks. And we also need to
perform lots of computational tasks for which we have RAM kept to 32 GB,
which can be increased. So on a high level just wanted to know does these
hardware specs make sense ?

Regards,
Praveenesh

On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 if you tell us the purpose of this cluster, then it would be helpful to
 tell exactly how good it is

 On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hello all,
 
  I am looking forward to build a 5 node hadoop cluster with the following
  configurations per machine.  --
 
  1. Intel Xeon E5-2609 (2.40GHz/4-core)
  2. 32 GB RAM (8GB 1Rx4 PC3)
  3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine)
  4. Ethernet 1GbE connection
 
  I would like the experts to please review it and share if this sounds
 like
  a optimal/deserving hadoop hardware configuration or not ?
  I know without knowing the actual use case its not worth commenting, but
  still in general I would like to have the views. Also please suggest if I
  am missing something.
 
  Regards,
  Praveenesh
 



 --
 Nitin Pawar



Re: Hadoop cluster hardware configuration

2012-06-04 Thread Nitin Pawar
if you are doing computations using hadoop on a miniscale yes this hardware
is good enough.

Normally hadoop clusters are pre-occupied with the heavy loads so they are
not shared for multiple usage unless your utilization of hadoop is on lower
side and then you want to reuse the hardware.



On Mon, Jun 4, 2012 at 5:52 PM, praveenesh kumar praveen...@gmail.comwrote:

 On a very high level... we would be utilizing cluster not only for hadoop
 but for other I/O bound or in-memory operations.
 That is the reason we are going for SAS hard disks. And we also need to
 perform lots of computational tasks for which we have RAM kept to 32 GB,
 which can be increased. So on a high level just wanted to know does these
 hardware specs make sense ?

 Regards,
 Praveenesh

 On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

  if you tell us the purpose of this cluster, then it would be helpful to
  tell exactly how good it is
 
  On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.com
  wrote:
 
   Hello all,
  
   I am looking forward to build a 5 node hadoop cluster with the
 following
   configurations per machine.  --
  
   1. Intel Xeon E5-2609 (2.40GHz/4-core)
   2. 32 GB RAM (8GB 1Rx4 PC3)
   3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine)
   4. Ethernet 1GbE connection
  
   I would like the experts to please review it and share if this sounds
  like
   a optimal/deserving hadoop hardware configuration or not ?
   I know without knowing the actual use case its not worth commenting,
 but
   still in general I would like to have the views. Also please suggest
 if I
   am missing something.
  
   Regards,
   Praveenesh
  
 
 
 
  --
  Nitin Pawar
 




-- 
Nitin Pawar


Hadoop and hardware

2011-12-16 Thread Cussol


In my company, we intend to set up an hadoop cluster to run analylitics
applications. This cluster would have about 120 data nodes with dual sockets
servers with a GB interconnect. We are also exploring a solution with 60
quad sockets servers. How do compare the quad sockets and dual sockets
servers in an hadoop cluster ?

any help ?

pierre
-- 
View this message in context: 
http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop and hardware

2011-12-16 Thread J. Rottinghuis
Pierre,

As discussed in recent other threads, it depends.
The most sensible thing for Hadoop nodes is to find a sweet spot for
price/performance.
In general that will mean keeping a balance between compute power, disks,
and network bandwidth, and factor in racks, space, operating costs etc.

How much storage capacity are you thinking of when you target about 120
data nodes?

If you had for example 60 quad core nodes with 12 * 2 TB disks (or more) I
would suspect you would be bottle-necked on your 1GB network connections.

Other things to consider is how many nodes per rack? If these 60 nodes
would be 2u and you'd fit 20 nodes in a rack, then loosing one top of the
rack switch means loosing 1/3 of the capacity of your cluster.

Yet another consideration is how easily you want to be able to expand your
cluster incrementally? Until you run Hadoop 0.23 you probably want all your
nodes to be roughly similar in capacity.

Cheers,

Joep

On Fri, Dec 16, 2011 at 3:50 AM, Cussol pierre.cus...@cnes.fr wrote:



 In my company, we intend to set up an hadoop cluster to run analylitics
 applications. This cluster would have about 120 data nodes with dual
 sockets
 servers with a GB interconnect. We are also exploring a solution with 60
 quad sockets servers. How do compare the quad sockets and dual sockets
 servers in an hadoop cluster ?

 any help ?

 pierre
 --
 View this message in context:
 http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Hadoop cluster hardware details for big data

2011-07-07 Thread Karthik Kumar
Hi,

Thanks a lot for your timely help. Your valuable answers helped us to
understand what kind of hardware to use when it comes to huge data.

With Regards,
Karthik

On 7/6/11, Steve Loughran ste...@apache.org wrote:
 On 06/07/11 13:18, Michel Segel wrote:
 Wasn't the answer 42?  ;-P


 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the
 worker nodes

 Looking at your calc...
 You forgot to factor in the number of slots per node.
 So the number is only a fraction. Assume 10 slots per node. (10 because it
 makes the math easier.)
   
 I thought something was wrong. Then I thought of the server revenue and
 decided not to look that hard.



-- 
With Regards,
Karthik


Hadoop cluster hardware details for big data

2011-07-06 Thread Karthik Kumar
Hi,

Has anyone here used hadoop to process more than 3TB of data? If so we
would like to know how many machines you used in your cluster and
about the hardware configuration. The objective is to know how to
handle huge data in Hadoop cluster.

-- 
With Regards,
Karthik


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Harsh J
Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It
contains information relevant to your question, if not a detailed
answer.

On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com wrote:
 Hi,

 Has anyone here used hadoop to process more than 3TB of data? If so we
 would like to know how many machines you used in your cluster and
 about the hardware configuration. The objective is to know how to
 handle huge data in Hadoop cluster.

 --
 With Regards,
 Karthik




-- 
Harsh J


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Karthik Kumar
Hi,

I wanted to know the time required to process huge datasets and number
of machines used for them.

On 7/6/11, Harsh J ha...@cloudera.com wrote:
 Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It
 contains information relevant to your question, if not a detailed
 answer.

 On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com
 wrote:
 Hi,

 Has anyone here used hadoop to process more than 3TB of data? If so we
 would like to know how many machines you used in your cluster and
 about the hardware configuration. The objective is to know how to
 handle huge data in Hadoop cluster.

 --
 With Regards,
 Karthik




 --
 Harsh J



-- 
With Regards,
Karthik


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Harsh J
Karthik,

That's a highly process-dependent question I think -- What you would
do with the data, would determine the time it takes. No two
applications are the same in my belief.

On Wed, Jul 6, 2011 at 4:35 PM, Karthik Kumar karthik84ku...@gmail.com wrote:
 Hi,

 I wanted to know the time required to process huge datasets and number
 of machines used for them.

 On 7/6/11, Harsh J ha...@cloudera.com wrote:
 Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It
 contains information relevant to your question, if not a detailed
 answer.

 On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com
 wrote:
 Hi,

 Has anyone here used hadoop to process more than 3TB of data? If so we
 would like to know how many machines you used in your cluster and
 about the hardware configuration. The objective is to know how to
 handle huge data in Hadoop cluster.

 --
 With Regards,
 Karthik




 --
 Harsh J



 --
 With Regards,
 Karthik




-- 
Harsh J


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran

On 06/07/11 11:43, Karthik Kumar wrote:

Hi,

Has anyone here used hadoop to process more than 3TB of data? If so we
would like to know how many machines you used in your cluster and
about the hardware configuration. The objective is to know how to
handle huge data in Hadoop cluster.



This is too vague a question. What do you mean process?. Scan through 
some logs looking for values? You could do that on a single machine if 
you weren't in a rush and you have enough disks, you'd just be very IO 
bound, and to be honest HDFS needs a minimum number of machines to 
become fault tolerant. Do complex matrix operations that use lots of RAM 
and CPU? You'll need more machines.


If your cluster has a blocksize of 512MB then a 3TB file fits into 
(3*1024*1024)/512 blocks: 6144. so you can't have more than 6144 
machines anyway -that's your theoretical maximum, even if your name is 
Facebook or Yahoo!


What you are looking for is something in between 10 and 6144, the exact 
number driven by
 -how much compute you need to do, and how fast you want it done 
(controls #of CPUs, RAM)

 -how much total HDD storage you anticipate needing
 -whether you want to do leading-edge GPU work (good performance on 
some tasks, but limited work per machine)


You can use benchmarking tools like gridmix3 to get some more data on 
the characteristics of your workload, which you can then take to your 
server supplier to say this is what we need, what can you offer? 
Otherwise everyone is just guessing.


Remember also that you can add more racks later, but you will need to 
plan ahead on datacentre space, power and -very importantly- how you are 
going to expand the networking. Life is simplest if everything fits into 
one rack, but if you plan to expand you need to have a roadmap of how to 
connect that rack to some new ones, which means adding fast interconnect 
between different top of rack switches. You also need to worry about how 
to get data in and out fast.



-Steve


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran

On 06/07/11 11:43, Karthik Kumar wrote:

Hi,

Has anyone here used hadoop to process more than 3TB of data? If so we
would like to know how many machines you used in your cluster and
about the hardware configuration. The objective is to know how to
handle huge data in Hadoop cluster.



Actually, I've just thought of  simpler answer. 40. It's completely 
random, but if said with confidence it's as valid as any other answer to 
your current question.


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran

On 06/07/11 13:18, Michel Segel wrote:

Wasn't the answer 42?  ;-P



42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the 
worker nodes



Looking at your calc...
You forgot to factor in the number of slots per node.
So the number is only a fraction. Assume 10 slots per node. (10 because it 
makes the math easier.)


I thought something was wrong. Then I thought of the server revenue and 
decided not to look that hard.


Re: Thoughts about Hadoop cluster hardware

2010-07-17 Thread U235Sentinel
Awesome!  I appreciate it.  I'm off on training right now so I'm just
starting to catch up.  I'll check out those servers and see how they compare

thanks a bunch!

On Tue, Jul 13, 2010 at 8:36 PM, Allen Wittenauer
awittena...@linkedin.comwrote:


 On Jul 13, 2010, at 5:00 PM, u235sentinel wrote:

  So we're talking to Dell about their new PowerEdge c2100 servers for a
 Hadoop cluster but I'm wondering.  Isn't this still a little overboard for
 nodes in a cluster?  I'm wondering if we bought say 100 poweredge 2750's
 instead of just 50 c2100's.  The price would be about the same for the
 configuration we're talking about and we would get twice as many nodes.

 Ultimately, it depends upon your job flow and how much data you have.

 FWIW we're currently using a Sun equivalent of the C2100s w/8 of the 12
 drive slots filled.  You need a *LOT* of iops to make it worth while.  [From
 what I've seen, even people who think they have a lot of iops generally have
 other problems with their code/tuning that are causing the iops.   So even
 if you think you have a lot, you may not.]

  I'm curious if any other's are running Dell PowerEdge servers with
 Hadoop.
 
  We've also been kicking the idea around of going with blade servers (Dell
 and/or HP).

 If you are thinking traditional blade where storage is comes mainly from
 NAS or SAN, you are going to be very, very unhappy unless your data set is
 very, very tiny.

 Check out the PoweredBy page on the wiki.  Quite a few folks list their
 gear. FWIW, we're currently evaluating HP SLs and should be getting some
 Dell C6100s in soon, assuming Dell can deliver the eval unit on time.


Thoughts about Hadoop cluster hardware

2010-07-13 Thread u235sentinel
So we're talking to Dell about their new PowerEdge c2100 servers for a 
Hadoop cluster but I'm wondering.  Isn't this still a little overboard 
for nodes in a cluster?  I'm wondering if we bought say 100 poweredge 
2750's instead of just 50 c2100's.  The price would be about the same 
for the configuration we're talking about and we would get twice as many 
nodes.


I'm curious if any other's are running Dell PowerEdge servers with Hadoop.

We've also been kicking the idea around of going with blade servers 
(Dell and/or HP).


Just curious

Thanks!!


Re: Thoughts about Hadoop cluster hardware

2010-07-13 Thread Allen Wittenauer

On Jul 13, 2010, at 5:00 PM, u235sentinel wrote:

 So we're talking to Dell about their new PowerEdge c2100 servers for a Hadoop 
 cluster but I'm wondering.  Isn't this still a little overboard for nodes in 
 a cluster?  I'm wondering if we bought say 100 poweredge 2750's instead of 
 just 50 c2100's.  The price would be about the same for the configuration 
 we're talking about and we would get twice as many nodes.

Ultimately, it depends upon your job flow and how much data you have.  

FWIW we're currently using a Sun equivalent of the C2100s w/8 of the 12 drive 
slots filled.  You need a *LOT* of iops to make it worth while.  [From what 
I've seen, even people who think they have a lot of iops generally have other 
problems with their code/tuning that are causing the iops.   So even if you 
think you have a lot, you may not.]

 I'm curious if any other's are running Dell PowerEdge servers with Hadoop.
 
 We've also been kicking the idea around of going with blade servers (Dell 
 and/or HP).

If you are thinking traditional blade where storage is comes mainly from NAS or 
SAN, you are going to be very, very unhappy unless your data set is very, very 
tiny.

Check out the PoweredBy page on the wiki.  Quite a few folks list their gear. 
FWIW, we're currently evaluating HP SLs and should be getting some Dell C6100s 
in soon, assuming Dell can deliver the eval unit on time.