Hi! We are in the process to plan a new Hadoop 2.0 cluster. Ambari looks really great for this job but while looking through the documentation we stumbled upon a few questions:
1. Stateless images and Ambari We think about booting all machines in the Cluster using PXE + stateless images. This means the OS image will only be in memory and changes to /etc/ or files will vanish after an reboot. Is it possible to use Ambari in such a setup? In theory in should be enough to start the ambari-agent after booting the image and the agent will ensure that the configuration is correct. The idea is to use all the HDDs in the machines for HDFS storage and to avoid the burden of maintance for seperate OS installs. Provisioning the OS via automated install on the HDD is another option if stateless imagining is not compatible with Ambari. Can anyone here tell what they are using? What are the best practices? We will have around 140 machines. 2. Existing Icinga/Nagios and Ganglia Is it possible to use an existing install of Ganglia and Nagios for Ambari? We already a smaller Hadoop cluster and have Ganglia and Icinga checks in place. We would like to avoid having duplicate Infrastructure if possible run only one Icinga/Nagios server and only one Ganglia instance for everything. 3. Existing Hadoop Is it possible to migrate an existing HDFS to Ambari? We have 150TB data in one HDFS and would migrate that to Ambari but due to automated nature of the installation I'd like to ask if it is safe to do so. Does Ambari format the disks on the nodes while installing? Or will the NameNode be formatted during installation? 4. Ubuntu 14.04 support We plan on using Ubuntu 14.04 LTS for the new cluster as we are only using Ubuntu in the department here. Is this a bad idea? Will there be support in the future? From looking through the requirements it shouldn't be a major problem as Ambari is mostly Python and Java - but if it is not and will not be supported we probably have to change the OS. Thanks for any help! If you are already running a bigger Hadoop cluster I'd love to hear some advice and best-practices for managing the system. At the moment we plan on using xCat for provisioning the machines, Saltstack for configuration management and Ambari for managing the Hadoop configuration. regards Martin Tippmann
