Thanks Gary!!
On Thu, Sep 15, 2011 at 10:34 PM, Gary Helmling <[email protected]> wrote: > Running on EC2 has been discussed on the list quite a bit in the past, so > you might want to do some searches on the archives. Here are a few threads > I pulled up: > > http://search-hadoop.com/m/paQmKTxSgj > > http://search-hadoop.com/m/7E9PaA6U1V > > http://search-hadoop.com/m/sGXTATdlIg2 > > For instance types, it appears that only c1.xlarge, m2.4xlarge and > cc1.xlarge instances will get you a physical server for each instance, so > you will pay the least IO virtualization "tax" using these with instance > storage. But even with that expect reduced IO performance vs physical > hardware. > > For the node layout, I'd suggest something like: > > 1 - NameNode, JobTracker, ZooKeeper, HMaster > 1 - SecondaryNameNode, HMaster > 3 - DataNode, TaskTracker, RegionServer > > You could run more ZK instances on smaller instance types (m1.medium?), but > beware that these could be more subject to erratic IO throughput due to > other instances running on the same physical server, which could negatively > impact zookeeper performance and overall cluster stability. So for a > cluster this small, I don't think I would bother. > > For instance types, it'll depend on your workload and memory requirements. > I usually use c1.xlarge for HBase testing, but those have somewhat limited > memory, so you'll be constrained on the number of MR tasks you can run > without overcommitting memory (you want to avoid swapping at all costs). > > I would say to do some testing with your workload and see what instance > types give you the best performance at an acceptable price. > > --gh > > > On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <[email protected]> wrote: > > > Hi, > > > > I am wondering if someone can recommend on the best practice with > selecting > > the right AMAZON EC2 instances combination for the following > > implementation: > > > > Cloudera Hadoop HDFS and MapReduce: > > > > - 1 NameNode + JobTracker servers. > > - 1 SecondaryNameNode server. > > - 3 DataNodes + TastTrackers. > > > > > > Cloudera HBase: > > > > - 2 HMaster servers > > - 3 ZooKeeper Servers > > - 2 Region Servers. > > > > > > From your own experience what AMAZON EC2 instances should I choose? > > How would you combine and place the above implementation across the > > instances? > > Should I place datanode & task tracker with HRegionServer on the same > > instance? > > > > Thanks ! > > > > -- > > * > > Ronen.* > > > > <http://www.taykey.com/> > > > -- * Ronen Itkin* Taykey | www.taykey.com
