Hi!

We are in the process to plan a new Hadoop 2.0 cluster. Ambari looks
really great for this job but while looking through the documentation
we stumbled upon a few questions:

1. Stateless images and Ambari

We think about booting all machines in the Cluster using PXE +
stateless images. This means the OS image will only be in memory and
changes to /etc/ or files will vanish after an reboot. Is it possible
to use Ambari in such a setup? In theory in should be enough to start
the ambari-agent after booting the image and the agent will ensure
that the configuration is correct.

The idea is to use all the HDDs in the machines for HDFS storage and
to avoid the burden of maintance for seperate OS installs.
Provisioning the OS via automated install on the HDD is another option
if stateless imagining is not compatible with Ambari.

Can anyone here tell what they are using? What are the best practices?
We will have around 140 machines.


2. Existing Icinga/Nagios and Ganglia

Is it possible to use an existing install of Ganglia and Nagios for
Ambari? We already a smaller Hadoop cluster and have Ganglia and
Icinga checks in place. We would like to avoid having duplicate
Infrastructure if possible run only one Icinga/Nagios server and only
one Ganglia instance for everything.

3. Existing Hadoop

Is it possible to migrate an existing HDFS to Ambari? We have 150TB
data in one HDFS and would migrate that to Ambari but due to automated
nature of the installation I'd like to ask if it is safe to do so.
Does Ambari format the disks on the nodes while installing? Or will
the NameNode be formatted during installation?

4. Ubuntu 14.04 support

We plan on using Ubuntu 14.04 LTS for the new cluster as we are only
using Ubuntu in the department here. Is this a bad idea? Will there be
support in the future? From looking through the requirements it
shouldn't be a major problem as Ambari is mostly Python and Java - but
if it is not and will not be supported we probably have to change the
OS.


Thanks for any help!

If you are already running a bigger Hadoop cluster I'd love to hear
some advice and best-practices for managing the system. At the moment
we plan on using xCat for provisioning the machines, Saltstack for
configuration management and Ambari for managing the Hadoop
configuration.

regards
Martin Tippmann

Reply via email to