Greetings. I have an announcement that of great interest to the HBase/Hive communities.
tl;dr :: You can build a large, realistically-configured HBase cluster in a few hours (from nothing) using Chef or Ansible (Puppet in the works). It also builds large high-availability MySQL clusters as well, and other DBMSs are in the works. The project is open-source. Project :: The Palomino Cluster Tool ( https://github.com/time-palominodb/PalominoClusterTool). License: Apache. Why this is interesting :: The Chef Cookbook and Ansible Playbooks are known to generate fully-distributed HBase (with Zookeeper, separate NameNode, JobTracker, HMaster, RegionServer, etc) with far fewer bugs or limitations of any other Cookbook or Playbooks known before. They also are not specific to one company's environment, but should be suitable to YOUR environment. I've done extensive searches for configuration management scripts that would do this hard work for me and came up empty-handed, so I've rolled my own and am OSSing it so that no-one else has to feel my pain. This code represents hundreds of hours of research, iteration, web searches, experience, tuning. Many of the common gotchas that... got me... are covered. Xcievers? Check. Data files in /tmp? Nope. Ulimits? Check. Init scripts that work with Chef? Check. Documentation covers other typical gotchas. NameNode formatted? hdfs://users/mapred exists and owned properly? hdfs://hbase exists and owned properly? Too many to list. Look at the code yourself, and feel free to write to the project mailing list with any gotchas/tunings you're aware of that aren't covered in the code. Interesting Entrance Points :: HDFS+Hive+HBase on CentOS via Chef ( https://github.com/time-palominodb/PalominoClusterTool/tree/master/ChefCookbooks/CentOS/cloudera). Multiple distributed DBMS, including HDFS+HBase on Ubuntu via Ansible ( https://github.com/time-palominodb/PalominoClusterTool/tree/master/AnsiblePlaybooks/Ubuntu-12.04 ). Feedback welcome. Pull requests more than welcome. -- *Tim Ellis | *Fifth Sigma, Inc. Excellence in Multimedia and Technology
