Running on EC2 has been discussed on the list quite a bit in the past, so you might want to do some searches on the archives. Here are a few threads I pulled up:
http://search-hadoop.com/m/paQmKTxSgj http://search-hadoop.com/m/7E9PaA6U1V http://search-hadoop.com/m/sGXTATdlIg2 For instance types, it appears that only c1.xlarge, m2.4xlarge and cc1.xlarge instances will get you a physical server for each instance, so you will pay the least IO virtualization "tax" using these with instance storage. But even with that expect reduced IO performance vs physical hardware. For the node layout, I'd suggest something like: 1 - NameNode, JobTracker, ZooKeeper, HMaster 1 - SecondaryNameNode, HMaster 3 - DataNode, TaskTracker, RegionServer You could run more ZK instances on smaller instance types (m1.medium?), but beware that these could be more subject to erratic IO throughput due to other instances running on the same physical server, which could negatively impact zookeeper performance and overall cluster stability. So for a cluster this small, I don't think I would bother. For instance types, it'll depend on your workload and memory requirements. I usually use c1.xlarge for HBase testing, but those have somewhat limited memory, so you'll be constrained on the number of MR tasks you can run without overcommitting memory (you want to avoid swapping at all costs). I would say to do some testing with your workload and see what instance types give you the best performance at an acceptable price. --gh On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <[email protected]> wrote: > Hi, > > I am wondering if someone can recommend on the best practice with selecting > the right AMAZON EC2 instances combination for the following > implementation: > > Cloudera Hadoop HDFS and MapReduce: > > - 1 NameNode + JobTracker servers. > - 1 SecondaryNameNode server. > - 3 DataNodes + TastTrackers. > > > Cloudera HBase: > > - 2 HMaster servers > - 3 ZooKeeper Servers > - 2 Region Servers. > > > From your own experience what AMAZON EC2 instances should I choose? > How would you combine and place the above implementation across the > instances? > Should I place datanode & task tracker with HRegionServer on the same > instance? > > Thanks ! > > -- > * > Ronen.* > > <http://www.taykey.com/> >
