Seasons Greetings All, I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):
BigTop -> RPM like packaging for Hadoop Ambari -> GUI management/monitoring/provisioning Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion): BigTop (packaging) CDH HDP Apache Bigtop Cloudera Cloudera Manager (closed source, commercial) Hortonworks / Apache Ambari (open source) The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP. Is there an equivalent for Apache? If I use the kit found at ambari.apache.org to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration. Cheers, - SteveN
