Good afternoon,

I programmed kmeans with Hadoop in R using Rhadoop in a cluster of 3 machines 
(3 virtual machines on a single one. Each machine represents one core and has 
1,5 GB of RAM).

The purpose is to compare computation time between a Hadoop cluster and a local 
machine (without Hadoop) for kmeans.


I simulated data with gaussian distribution. With 2 millions data, computation 
time with Hadoop is still much more higher than time taken without Hadoop. Can 
computation time with Hadoop be lower than time without Hadoop?

If yes, how can I do it? As I am working on a single machine with 3 VM, I am 
wondering if it is possible to see the advantages of doing computations with 
Hadoop.


Thank you.

Jeremy

Reply via email to