I recommend you to deploy master node of HDFS, MR, HBase in different servers which can get better performance. An example scenario is:
1, Deploy zookeeper on each server or the server1, server2, server3, and they make up a zookeeper cluster of odd numbers. 2, Deploy HDFS NameNode, backup NN, MR JobTracker, HBase Master, backup Master on the five servers separately. 3, Deploy each server with DataNode and TaskTracker. I reference the deploy scenario of Facebook Message System. http://www.slideshare.net/parallellabs/sigmod-realtime-hadooppresentation 2012/11/13 Dalia Sobhy <[email protected]> > I do advise you to use Cloudera Manager its a very simple and opensource > cluster configuration software.. > > A good design is to run zookeeper on node1, node2, another node alone > > Sent from my iPhone > > On 2012-11-13, at 2:04 AM, "Hakan Bogay" <[email protected]> wrote: > > > Hi, > > > > I am a newbie to Hadoop, HBase and Hive. I installed Hadoop, HBase and > Hive > > in pseudodistributed mode and everything works fine. Now I am planning to > > set up an simple Hadoop Cluster (5 nodes) with Hive, HBase and ZooKeeper. > > I´ve read several documentations and instructions before but i could not > > find a good explanation for my question. I´m not sure, where to run all > the > > daemons. This is my consideration: > > > > *Node_1* (Master) > > > > - NameNode > > - JobTrakcer > > - HBase Master > > - > > > > ZooKeeper (Standalone node; managed by HBase) > > > > > > > > *Node_2* (Backup_Master) > > > > - > > > > SecondaryNameNode > > > > > > > > *Node_3* (Slave1) > > > > - DataNode1 > > - TaskTracker1 > > - > > > > RegionServer1 > > > > > > > > *Node_4* (Slave2) > > > > - DataNode2 > > - TaskTracker2 > > - > > > > RegionServer2 > > > > > > > > *Node_5* (Slave3) > > > > - DataNode3 > > - TaskTracker3 > > - RegionServer3 > > > > > > I know, in production it is recommended to run ZooKeeper ensemble at an > odd > > number of nodes (seperate Cluster). But for a simple cluster, is it OK to > > set up a standalone ZooKeeper node which runs on the master node? > > Another question is regarding Hive: I know that Hive is a Hadoop client. > > Should I also install Hive on the master node? Does it make sense? > > > > Thanks for all tips and comments! > > > > Hakan > > > > Note: I have just 5 machines to simulate a cluster. >
