RE: Performance Benchmarks on "Number of Machines"

Matieu Bachant-Lagace Fri, 27 May 2016 11:51:20 -0700

You could see it the other way around, it is enabling everyone to solve 
problems that are too complex for one server.


Another way to look at it is that it reduces costs because scaling out is much 
cheaper than scaling up.

You can actually (and usually you have to) be pretty ingenious and you have to 
be a good software developer if you want to do something in Hadoop. If you are 
doing a poor programming job you will not get the expected benefits.

My 2c.

Matieu

De : Deepak Goel [mailto:[email protected]]
Envoyé : 27 mai 2016 14:45
À : Arun Natva <[email protected]>
Cc : user <[email protected]>
Objet : Re: Performance Benchmarks on "Number of Machines"

What I think, and i am sorry if i am wrong :-(

In the cluster you are not only adding hardware (cpu, memory, disk) but you are 
having separate software (os, jvm, application)...So the reason the cluster is 
scaling linear is not due to hardware, but due to seperate software on each 
machine [As compared to a single machine where you scale up (keep on adding 
cpu, memory, disk to the same machine) but the software remains the same (OS, 
JVM, application)].. So scale out (cluster) has an advantage over scale up 
(single machine with more hardware), that the software set is different for 
each machine

So Hadoop is making us bad software programmers overall by providing us the 
facility to replicate the software across multiple machines and of course 
providing reliability :)

Hey

Namaskara~Nalama~Guten Tag~Bonjour


   --
Keigu

Deepak
73500 12833
www.simtree.net<http://www.simtree.net>, 
[email protected]<mailto:[email protected]>
[email protected]<mailto:[email protected]>

LinkedIn: www.linkedin.com/in/deicool<http://www.linkedin.com/in/deicool>
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

On Fri, May 27, 2016 at 11:40 PM, Arun Natva 
<[email protected]<mailto:[email protected]>> wrote:
Deepak,
I believe yahoo and Facebook have largest clusters like over 4-5 thousand nodes 
of size..
If you add a new server to the cluster, you are simply adding to the cpu, 
memory, disk space of the cluster.. So, the capacity grows linearly as you add 
nodes except that network bandwidth is shared

I didn't understand your last question on scaling...


Sent from my iPhone

On May 27, 2016, at 11:51 AM, Deepak Goel 
<[email protected]<mailto:[email protected]>> wrote:

Hey

Namaskara~Nalama~Guten Tag~Bonjour

Are there any performance benchmarks as to how many machines can Hadoop scale 
up to? Is the growth linear (For 1 machine - growth x, for 2 machines - 2x 
growth, for 10000 machines - 10000x growth??)

Also does the scaling depend on the type of jobs and amount of data? Or is it 
independent?

Thank You
Deepak
   --
Keigu

Deepak
73500 12833
www.simtree.net<http://www.simtree.net>, 
[email protected]<mailto:[email protected]>
[email protected]<mailto:[email protected]>

LinkedIn: www.linkedin.com/in/deicool<http://www.linkedin.com/in/deicool>
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

RE: Performance Benchmarks on "Number of Machines"

Reply via email to