You can find details of this problem here, with one solution highlighted: https://issues.apache.org/jira/browse/HDFS-107. In a Bigtop deployment you will find that file under /var/lib/hadoop-hdfs/.
On Wed, Jul 16, 2014 at 2:51 PM, Sean Mackrory <[email protected]> wrote: > I suspect your problem is that all your DataNodes are already initialized > with the namespace-id of the pseudo-distributed instances they originally > connected to. When a DataNode first connects to it's NameNode, it gets this > ID and then if you ever re-format the NameNode or just create a new > NameNode, the DataNode won't just play-nice with the new one. This will > cause you to lose all your data (but if you lose all your NameNodes > permanently you've pretty much lost it anyway), but the solution is to > delete the old namespace-id so that when you restart the DataNode, it will > connect to the new NameNode as part of a new cluster / filesystem. IIRC, > you can do this by simply deleting the file containing this ID and then > 'service hadoop-hdfs-datanode restart'. Let me look up which file that is... > > > On Wed, Jul 16, 2014 at 2:35 PM, David Fryer <[email protected]> wrote: > >> Hi Sean, >> I now have each machine running in pseudo-distributed mode but when I try >> to run in distributed mode, I get an exception saying that there are 0 >> datanodes running. Any suggestions? I've modified core-site.xml to reflect >> what the cluster is supposed to look like. >> >> -David >> >> >> On Wed, Jul 16, 2014 at 1:17 PM, Sean Mackrory <[email protected]> >> wrote: >> >>> It might be easiest to get it working on a single node, and then once >>> you're familiar with the Bigtop packages and related files try on a >>> cluster. On a single node, you can do "yum install hadoop-conf-pseudo", >>> then format the namenode with "service hadoop-hdfs-namenode init", and then >>> start all of Hadoop: "for service in hadoop-hdfs-namenode >>> hadoop-hdfs-secondarynamenode hadoop-hdfs-datanode >>> hadoop-yarn-resourcemanager hadoop-yarn-nodemanager; do service $service >>> start; done". That should give you an idea of how Bigtop deploys stuff and >>> what packages you need. hadoop-conf-pseudo will install all the packages >>> that provide the init scripts and libraries required for every role, and a >>> working single-node configuration. You would want to install those roles on >>> different machines (e.g. NameNode and ResourceManager on one, DataNode and >>> NodeManager on all the others), and then edit the configuration files in >>> /etc/hadoop/conf on each node accordingly so the datanodes know which >>> namenode to connect to, etc. >>> >>> >>> On Wed, Jul 16, 2014 at 10:56 AM, Mark Grover < >>> [email protected]> wrote: >>> >>>> The 'hadoop' package just delivers the hadoop common bits but no init >>>> scripts to start the service, no convenience artifacts that deploy >>>> configuration for say, starting hadoop pseudo distributed cluster. For all >>>> practical purposes, you are going to need hadoop-hdfs and hadoop-mapreduce >>>> packages which deliver bits for HDFS and MR. However, even that may not be >>>> enough, you likely need init scripts to be installed for starting and >>>> stopping services related to HDFS and MR. So, depending on if you are >>>> installing Hadoop on a fully-distributed cluster or a pseudo-distributed >>>> cluster, you may need to install one or more services (and hence packages) >>>> like resource manager, node manager, namenode and datanode on the node(s). >>>> Then, you will have to deploy the configuration yourself. We have default >>>> configuration installed by packages but you definitely need to add some >>>> entries to make it work for a fully-distributed cluster e.g. adding the >>>> name of the namenode host to configuration of datanodes. If you are using >>>> just a pseudo-distributed, you can installed the pseudo distributed >>>> configuration package (which has all the necessary dependencies so >>>> installing that nothing else should be good) and you will get an >>>> out-of-the-box experience. >>>> >>>> FYI, if you do >>>> yum list 'hadoop*' >>>> You would find a list of all hadoop related packages that are available >>>> to be installed. >>>> >>>> >>>> >>>> On Wed, Jul 16, 2014 at 9:39 AM, David Fryer <[email protected]> >>>> wrote: >>>> >>>>> Is it necessary to install the whole hadoop stack? >>>>> >>>>> >>>>> On Wed, Jul 16, 2014 at 12:37 PM, David Fryer <[email protected]> >>>>> wrote: >>>>> >>>>>> The only output from that is: >>>>>> hadoop-2.0.5.1-1.el6.x86_64 >>>>>> >>>>>> -David >>>>>> >>>>>> >>>>>> On Wed, Jul 16, 2014 at 12:34 PM, Mark Grover <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Possibly, can you check what packages you have installed related to >>>>>>> hadoop. >>>>>>> >>>>>>> rpm -qa | grep hadoop >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 16, 2014 at 9:28 AM, David Fryer <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Mark, >>>>>>>> I'm trying to follow those instructions on a CentOS 6 machine, and >>>>>>>> after running "yum install hadoop\*", I can't find anything related to >>>>>>>> hadoop in /etc/init.d. Is there something I'm missing? >>>>>>>> >>>>>>>> -David >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 16, 2014 at 11:34 AM, Mark Grover <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Welcome, David. >>>>>>>>> >>>>>>>>> For physical machines, I personally always use instructions like >>>>>>>>> these: >>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0 >>>>>>>>> >>>>>>>>> These for Bigtop 0.6.0, the latest Bigtop release is 0.7.0 but we >>>>>>>>> don't have a page for that unfortunately (we should and if you could >>>>>>>>> help >>>>>>>>> with that, that'd be much appreciated!). We are tying up lose ends for >>>>>>>>> Bigtop 0.8, so we hope to release it soon. >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 16, 2014 at 8:20 AM, jay vyas < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> one more note : by "look at the csv file" above i meant, "edit it >>>>>>>>>> so that it reflects your >>>>>>>>>> environment". >>>>>>>>>> >>>>>>>>>> Make sure and read the puppet README file as well under >>>>>>>>>> bigtop-deploy/puppet. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 16, 2014 at 11:15 AM, jay vyas < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi david . >>>>>>>>>>> >>>>>>>>>>> Glad to hear the vagrant stuff worked for you. Now , the next >>>>>>>>>>> step will be to port it to bare metal, like you say. >>>>>>>>>>> >>>>>>>>>>> The Vagrantfile does two things >>>>>>>>>>> >>>>>>>>>>> 1) It creates a shared folder for all machines. >>>>>>>>>>> 2) It spins up centos boxes . >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So in the "real world" you will need to obviously set up ssh >>>>>>>>>>> between machines to start. >>>>>>>>>>> After that , roughly, will need to do the following: >>>>>>>>>>> >>>>>>>>>>> - clone bigtop onto each of your machines >>>>>>>>>>> - install puppet 2.x on each of the machines >>>>>>>>>>> - look at the csv file created in the vagrant provisioner, and >>>>>>>>>>> read the puppet README file (in bigtop-deploy) >>>>>>>>>>> - run puppet apply on the head node >>>>>>>>>>> Once that works >>>>>>>>>>> - run puppet apply on each slave. >>>>>>>>>>> now on any node that you use as client, (i just use the master >>>>>>>>>>> usually) you can yum install your favorite ecosystem components: >>>>>>>>>>> yum install -y pig mahout >>>>>>>>>>> >>>>>>>>>>> And you have a working hadoop cluster. >>>>>>>>>>> >>>>>>>>>>> one idea as I know your on the east coast, if your company is >>>>>>>>>>> interested in hosting/sponsoring a bigtop meetup, we could possibly >>>>>>>>>>> bring >>>>>>>>>>> some folks from the boston / nyc area together to walk through >>>>>>>>>>> building a >>>>>>>>>>> bigtop cluster on bare metal. Let us know if any other questions. >>>>>>>>>>> These >>>>>>>>>>> directions are admittedly a little bit rough. >>>>>>>>>>> >>>>>>>>>>> Also, once you get this working, you can help us to update the >>>>>>>>>>> wiki pages. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 16, 2014 at 10:39 AM, David Fryer < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Bigtop! >>>>>>>>>>>> >>>>>>>>>>>> I'm looking to use bigtop to help set up a small hadoop >>>>>>>>>>>> cluster. I'm currently messing about with the hadoop tarball and >>>>>>>>>>>> all of the >>>>>>>>>>>> associated xml files, and I don't really have the time or >>>>>>>>>>>> expertise to get >>>>>>>>>>>> it up and working. >>>>>>>>>>>> >>>>>>>>>>>> Jay suggested that bigtop may be a good solution, so I've >>>>>>>>>>>> decided to give it a shot. Unfortunately, documentation is fairly >>>>>>>>>>>> sparse >>>>>>>>>>>> and I'm not quite sure where to start. I've cloned the github repo >>>>>>>>>>>> and used >>>>>>>>>>>> the startup.sh script found in >>>>>>>>>>>> bigtop/bigtop-deploy/vm/vagrant-puppet to >>>>>>>>>>>> set up a virtual cluster, but I am unsure how to apply this to >>>>>>>>>>>> physical >>>>>>>>>>>> machines. I'm also not quite sure how to get hadoop and hdfs up >>>>>>>>>>>> and working. >>>>>>>>>>>> >>>>>>>>>>>> Any help would be appreciated! >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David Fryer >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> jay vyas >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> jay vyas >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
