One key point is that the components that are running out of the box are mostly running in a single-node configuration or with an embedded database as a backend. Practically all of these systems will require some manual configuration before they are production-ready. Neither packages nor puppet can solve that entirely - we would really need something that can orchestrate the different roles in the cluster in bringing up the services. Even then, I suspect such a system would require some manual input regarding what you want, because there are so many different ways you might want to deploy all this.
- Hadoop zkfc: This is for high-availability in HDFS. I don't know the specifics but I would not expect this to be running out-of-the-box. - I don't have a ton of experience with the other Hadoop daemons but I know the NodeManager usually works for me. I'd be curious to know what problem you ran into here. - We could probably make a "hbase-conf-pseudo" package that installs a working single-node configuration, but again - it would never be used that way in most cases. I thought by default the master operated in "stand-alone" mode, and by enabling "distributed mode" in the configuration you could then run a region server on the same node. See http://hbase.apache.org/book/standalone_dist.html. - The Hive Metastore needs an external RDBMS to be configured. Some services come with a default "embedded" database but these are never suitable for production and usually cause more trouble than they are worth, IMHO. I love the sound of "everything working out of the box", but I think this is one case where we need to help the user understand what external infrastructure is required to make the system work properly. - Not familiar with Spark, but I believe we stopped shipping Scala embedded in Spark and a user would need to have it installed beforehand, just like with Java? I'm probably wrong here - just a hint. Thanks for sharing your emails with the list. As Jay Vyas mentioned - a lot of the contributors can get busy at times but it would be great to start collecting this information into a better "User Manual". On Wed, Nov 20, 2013 at 6:32 PM, Steven Núñez <[email protected]>wrote: > Gents, > > Below is a summary of the results of an out of the box CentOS/EC2 BigTop > 0.70.0 install. It lists all the components I need for the project I’m > writing about. What would be useful somewhere on the wiki is a list of > known issues and a page to some possible resolutions. This could be as easy > as taking this list and adding a third column ‘workaround’ with a page on > how to fix it. It could also be used as a QA page of sorts, on the > assumption that all of the components are supposed to work out of the box > (looks like some of the init.d scripts aren’t quite right either judging by > the error below). > > Cheers, > - SteveN > > Hadoop datanode is running [ OK ] > Hadoop journalnode is running [ OK ] > Hadoop namenode is running [ OK ] > Hadoop secondarynamenode is running [ OK ] > Hadoop zkfc is dead and pid file exists [FAILED] > Hadoop httpfs is running [ OK ] > Hadoop historyserver is dead and pid file exists [FAILED] > Hadoop nodemanager is dead and pid file exists [FAILED] > Hadoop proxyserver is dead and pid file exists [FAILED] > Hadoop resourcemanager is running [ OK ] > hald (pid 1041) is running... > HBase master daemon is dead and pid file exists [FAILED] > hbase-regionserver is not running. > HBase rest daemon is running [ OK ] > HBase thrift daemon is running [ OK ] > HCatalog server is running [ OK ] > Hive Metastore is dead and pid file exists [FAILED] > Hive Server is running [ OK ] > Hive Server2 is dead and pid file exists [FAILED] > not running but /var/run/oozie/oozie.pid exists. > Spark master is not running [FAILED] > Spark worker is not running [FAILED] > spice-vdagentd is stopped > Sqoop Server is running [ OK ] > >
