The 'hadoop' package just delivers the hadoop common bits but no init scripts to start the service, no convenience artifacts that deploy configuration for say, starting hadoop pseudo distributed cluster. For all practical purposes, you are going to need hadoop-hdfs and hadoop-mapreduce packages which deliver bits for HDFS and MR. However, even that may not be enough, you likely need init scripts to be installed for starting and stopping services related to HDFS and MR. So, depending on if you are installing Hadoop on a fully-distributed cluster or a pseudo-distributed cluster, you may need to install one or more services (and hence packages) like resource manager, node manager, namenode and datanode on the node(s). Then, you will have to deploy the configuration yourself. We have default configuration installed by packages but you definitely need to add some entries to make it work for a fully-distributed cluster e.g. adding the name of the namenode host to configuration of datanodes. If you are using just a pseudo-distributed, you can installed the pseudo distributed configuration package (which has all the necessary dependencies so installing that nothing else should be good) and you will get an out-of-the-box experience.
FYI, if you do yum list 'hadoop*' You would find a list of all hadoop related packages that are available to be installed. On Wed, Jul 16, 2014 at 9:39 AM, David Fryer <[email protected]> wrote: > Is it necessary to install the whole hadoop stack? > > > On Wed, Jul 16, 2014 at 12:37 PM, David Fryer <[email protected]> > wrote: > >> The only output from that is: >> hadoop-2.0.5.1-1.el6.x86_64 >> >> -David >> >> >> On Wed, Jul 16, 2014 at 12:34 PM, Mark Grover <[email protected]> wrote: >> >>> Possibly, can you check what packages you have installed related to >>> hadoop. >>> >>> rpm -qa | grep hadoop >>> >>> >>> On Wed, Jul 16, 2014 at 9:28 AM, David Fryer <[email protected]> >>> wrote: >>> >>>> Hi Mark, >>>> I'm trying to follow those instructions on a CentOS 6 machine, and >>>> after running "yum install hadoop\*", I can't find anything related to >>>> hadoop in /etc/init.d. Is there something I'm missing? >>>> >>>> -David >>>> >>>> >>>> On Wed, Jul 16, 2014 at 11:34 AM, Mark Grover <[email protected]> wrote: >>>> >>>>> Welcome, David. >>>>> >>>>> For physical machines, I personally always use instructions like these: >>>>> >>>>> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0 >>>>> >>>>> These for Bigtop 0.6.0, the latest Bigtop release is 0.7.0 but we >>>>> don't have a page for that unfortunately (we should and if you could help >>>>> with that, that'd be much appreciated!). We are tying up lose ends for >>>>> Bigtop 0.8, so we hope to release it soon. >>>>> >>>>> Mark >>>>> >>>>> >>>>> On Wed, Jul 16, 2014 at 8:20 AM, jay vyas <[email protected] >>>>> > wrote: >>>>> >>>>>> one more note : by "look at the csv file" above i meant, "edit it so >>>>>> that it reflects your >>>>>> environment". >>>>>> >>>>>> Make sure and read the puppet README file as well under >>>>>> bigtop-deploy/puppet. >>>>>> >>>>>> >>>>>> On Wed, Jul 16, 2014 at 11:15 AM, jay vyas < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi david . >>>>>>> >>>>>>> Glad to hear the vagrant stuff worked for you. Now , the next step >>>>>>> will be to port it to bare metal, like you say. >>>>>>> >>>>>>> The Vagrantfile does two things >>>>>>> >>>>>>> 1) It creates a shared folder for all machines. >>>>>>> 2) It spins up centos boxes . >>>>>>> >>>>>>> >>>>>>> So in the "real world" you will need to obviously set up ssh between >>>>>>> machines to start. >>>>>>> After that , roughly, will need to do the following: >>>>>>> >>>>>>> - clone bigtop onto each of your machines >>>>>>> - install puppet 2.x on each of the machines >>>>>>> - look at the csv file created in the vagrant provisioner, and read >>>>>>> the puppet README file (in bigtop-deploy) >>>>>>> - run puppet apply on the head node >>>>>>> Once that works >>>>>>> - run puppet apply on each slave. >>>>>>> now on any node that you use as client, (i just use the master >>>>>>> usually) you can yum install your favorite ecosystem components: >>>>>>> yum install -y pig mahout >>>>>>> >>>>>>> And you have a working hadoop cluster. >>>>>>> >>>>>>> one idea as I know your on the east coast, if your company is >>>>>>> interested in hosting/sponsoring a bigtop meetup, we could possibly >>>>>>> bring >>>>>>> some folks from the boston / nyc area together to walk through building >>>>>>> a >>>>>>> bigtop cluster on bare metal. Let us know if any other questions. >>>>>>> These >>>>>>> directions are admittedly a little bit rough. >>>>>>> >>>>>>> Also, once you get this working, you can help us to update the wiki >>>>>>> pages. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 16, 2014 at 10:39 AM, David Fryer <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Bigtop! >>>>>>>> >>>>>>>> I'm looking to use bigtop to help set up a small hadoop cluster. >>>>>>>> I'm currently messing about with the hadoop tarball and all of the >>>>>>>> associated xml files, and I don't really have the time or expertise to >>>>>>>> get >>>>>>>> it up and working. >>>>>>>> >>>>>>>> Jay suggested that bigtop may be a good solution, so I've decided >>>>>>>> to give it a shot. Unfortunately, documentation is fairly sparse and >>>>>>>> I'm >>>>>>>> not quite sure where to start. I've cloned the github repo and used the >>>>>>>> startup.sh script found in bigtop/bigtop-deploy/vm/vagrant-puppet to >>>>>>>> set up >>>>>>>> a virtual cluster, but I am unsure how to apply this to physical >>>>>>>> machines. >>>>>>>> I'm also not quite sure how to get hadoop and hdfs up and working. >>>>>>>> >>>>>>>> Any help would be appreciated! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David Fryer >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> jay vyas >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> jay vyas >>>>>> >>>>> >>>>> >>>> >>> >> >
