Hi, I am a newbie to Hadoop, HBase and Hive. I installed Hadoop, HBase and Hive in pseudodistributed mode and everything works fine. Now I am planning to set up an simple Hadoop Cluster (5 nodes) with Hive, HBase and ZooKeeper. I´ve read several documentations and instructions before but i could not find a good explanation for my question. I´m not sure, where to run all the daemons. This is my consideration:
*Node_1* (Master) - NameNode - JobTrakcer - HBase Master - ZooKeeper (Standalone node; managed by HBase) *Node_2* (Backup_Master) - SecondaryNameNode *Node_3* (Slave1) - DataNode1 - TaskTracker1 - RegionServer1 *Node_4* (Slave2) - DataNode2 - TaskTracker2 - RegionServer2 *Node_5* (Slave3) - DataNode3 - TaskTracker3 - RegionServer3 I know, in production it is recommended to run ZooKeeper ensemble at an odd number of nodes (seperate Cluster). But for a simple cluster, is it OK to set up a standalone ZooKeeper node which runs on the master node? Another question is regarding Hive: I know that Hive is a Hadoop client. Should I also install Hive on the master node? Does it make sense? Thanks for all tips and comments! Hakan Note: I have just 5 machines to simulate a cluster.
