Hi Sean, I now have each machine running in pseudo-distributed mode but when I try to run in distributed mode, I get an exception saying that there are 0 datanodes running. Any suggestions? I've modified core-site.xml to reflect what the cluster is supposed to look like.
-David On Wed, Jul 16, 2014 at 1:17 PM, Sean Mackrory <[email protected]> wrote: > It might be easiest to get it working on a single node, and then once > you're familiar with the Bigtop packages and related files try on a > cluster. On a single node, you can do "yum install hadoop-conf-pseudo", > then format the namenode with "service hadoop-hdfs-namenode init", and then > start all of Hadoop: "for service in hadoop-hdfs-namenode > hadoop-hdfs-secondarynamenode hadoop-hdfs-datanode > hadoop-yarn-resourcemanager hadoop-yarn-nodemanager; do service $service > start; done". That should give you an idea of how Bigtop deploys stuff and > what packages you need. hadoop-conf-pseudo will install all the packages > that provide the init scripts and libraries required for every role, and a > working single-node configuration. You would want to install those roles on > different machines (e.g. NameNode and ResourceManager on one, DataNode and > NodeManager on all the others), and then edit the configuration files in > /etc/hadoop/conf on each node accordingly so the datanodes know which > namenode to connect to, etc. > > > On Wed, Jul 16, 2014 at 10:56 AM, Mark Grover <[email protected] > > wrote: > >> The 'hadoop' package just delivers the hadoop common bits but no init >> scripts to start the service, no convenience artifacts that deploy >> configuration for say, starting hadoop pseudo distributed cluster. For all >> practical purposes, you are going to need hadoop-hdfs and hadoop-mapreduce >> packages which deliver bits for HDFS and MR. However, even that may not be >> enough, you likely need init scripts to be installed for starting and >> stopping services related to HDFS and MR. So, depending on if you are >> installing Hadoop on a fully-distributed cluster or a pseudo-distributed >> cluster, you may need to install one or more services (and hence packages) >> like resource manager, node manager, namenode and datanode on the node(s). >> Then, you will have to deploy the configuration yourself. We have default >> configuration installed by packages but you definitely need to add some >> entries to make it work for a fully-distributed cluster e.g. adding the >> name of the namenode host to configuration of datanodes. If you are using >> just a pseudo-distributed, you can installed the pseudo distributed >> configuration package (which has all the necessary dependencies so >> installing that nothing else should be good) and you will get an >> out-of-the-box experience. >> >> FYI, if you do >> yum list 'hadoop*' >> You would find a list of all hadoop related packages that are available >> to be installed. >> >> >> >> On Wed, Jul 16, 2014 at 9:39 AM, David Fryer <[email protected]> >> wrote: >> >>> Is it necessary to install the whole hadoop stack? >>> >>> >>> On Wed, Jul 16, 2014 at 12:37 PM, David Fryer <[email protected]> >>> wrote: >>> >>>> The only output from that is: >>>> hadoop-2.0.5.1-1.el6.x86_64 >>>> >>>> -David >>>> >>>> >>>> On Wed, Jul 16, 2014 at 12:34 PM, Mark Grover <[email protected]> wrote: >>>> >>>>> Possibly, can you check what packages you have installed related to >>>>> hadoop. >>>>> >>>>> rpm -qa | grep hadoop >>>>> >>>>> >>>>> On Wed, Jul 16, 2014 at 9:28 AM, David Fryer <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Mark, >>>>>> I'm trying to follow those instructions on a CentOS 6 machine, and >>>>>> after running "yum install hadoop\*", I can't find anything related to >>>>>> hadoop in /etc/init.d. Is there something I'm missing? >>>>>> >>>>>> -David >>>>>> >>>>>> >>>>>> On Wed, Jul 16, 2014 at 11:34 AM, Mark Grover <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Welcome, David. >>>>>>> >>>>>>> For physical machines, I personally always use instructions like >>>>>>> these: >>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0 >>>>>>> >>>>>>> These for Bigtop 0.6.0, the latest Bigtop release is 0.7.0 but we >>>>>>> don't have a page for that unfortunately (we should and if you could >>>>>>> help >>>>>>> with that, that'd be much appreciated!). We are tying up lose ends for >>>>>>> Bigtop 0.8, so we hope to release it soon. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 16, 2014 at 8:20 AM, jay vyas < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> one more note : by "look at the csv file" above i meant, "edit it >>>>>>>> so that it reflects your >>>>>>>> environment". >>>>>>>> >>>>>>>> Make sure and read the puppet README file as well under >>>>>>>> bigtop-deploy/puppet. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 16, 2014 at 11:15 AM, jay vyas < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi david . >>>>>>>>> >>>>>>>>> Glad to hear the vagrant stuff worked for you. Now , the next >>>>>>>>> step will be to port it to bare metal, like you say. >>>>>>>>> >>>>>>>>> The Vagrantfile does two things >>>>>>>>> >>>>>>>>> 1) It creates a shared folder for all machines. >>>>>>>>> 2) It spins up centos boxes . >>>>>>>>> >>>>>>>>> >>>>>>>>> So in the "real world" you will need to obviously set up ssh >>>>>>>>> between machines to start. >>>>>>>>> After that , roughly, will need to do the following: >>>>>>>>> >>>>>>>>> - clone bigtop onto each of your machines >>>>>>>>> - install puppet 2.x on each of the machines >>>>>>>>> - look at the csv file created in the vagrant provisioner, and >>>>>>>>> read the puppet README file (in bigtop-deploy) >>>>>>>>> - run puppet apply on the head node >>>>>>>>> Once that works >>>>>>>>> - run puppet apply on each slave. >>>>>>>>> now on any node that you use as client, (i just use the master >>>>>>>>> usually) you can yum install your favorite ecosystem components: >>>>>>>>> yum install -y pig mahout >>>>>>>>> >>>>>>>>> And you have a working hadoop cluster. >>>>>>>>> >>>>>>>>> one idea as I know your on the east coast, if your company is >>>>>>>>> interested in hosting/sponsoring a bigtop meetup, we could possibly >>>>>>>>> bring >>>>>>>>> some folks from the boston / nyc area together to walk through >>>>>>>>> building a >>>>>>>>> bigtop cluster on bare metal. Let us know if any other questions. >>>>>>>>> These >>>>>>>>> directions are admittedly a little bit rough. >>>>>>>>> >>>>>>>>> Also, once you get this working, you can help us to update the >>>>>>>>> wiki pages. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 16, 2014 at 10:39 AM, David Fryer < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Bigtop! >>>>>>>>>> >>>>>>>>>> I'm looking to use bigtop to help set up a small hadoop cluster. >>>>>>>>>> I'm currently messing about with the hadoop tarball and all of the >>>>>>>>>> associated xml files, and I don't really have the time or expertise >>>>>>>>>> to get >>>>>>>>>> it up and working. >>>>>>>>>> >>>>>>>>>> Jay suggested that bigtop may be a good solution, so I've decided >>>>>>>>>> to give it a shot. Unfortunately, documentation is fairly sparse and >>>>>>>>>> I'm >>>>>>>>>> not quite sure where to start. I've cloned the github repo and used >>>>>>>>>> the >>>>>>>>>> startup.sh script found in bigtop/bigtop-deploy/vm/vagrant-puppet to >>>>>>>>>> set up >>>>>>>>>> a virtual cluster, but I am unsure how to apply this to physical >>>>>>>>>> machines. >>>>>>>>>> I'm also not quite sure how to get hadoop and hdfs up and working. >>>>>>>>>> >>>>>>>>>> Any help would be appreciated! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David Fryer >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> jay vyas >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> jay vyas >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
