Re: New to Bigtop, where to start?

Sean Mackrory Wed, 16 Jul 2014 14:04:33 -0700

You can find details of this problem here, with one solution highlighted:
https://issues.apache.org/jira/browse/HDFS-107. In a Bigtop deployment you
will find that file under /var/lib/hadoop-hdfs/.



On Wed, Jul 16, 2014 at 2:51 PM, Sean Mackrory <[email protected]> wrote:

> I suspect your problem is that all your DataNodes are already initialized
> with the namespace-id of the pseudo-distributed instances they originally
> connected to. When a DataNode first connects to it's NameNode, it gets this
> ID and then if you ever re-format the NameNode or just create a new
> NameNode, the DataNode won't just play-nice with the new one. This will
> cause you to lose all your data (but if you lose all your NameNodes
> permanently you've pretty much lost it anyway), but the solution is to
> delete the old namespace-id so that when you restart the DataNode, it will
> connect to the new NameNode as part of a new cluster / filesystem. IIRC,
> you can do this by simply deleting the file containing this ID and then
> 'service hadoop-hdfs-datanode restart'. Let me look up which file that is...
>
>
> On Wed, Jul 16, 2014 at 2:35 PM, David Fryer <[email protected]> wrote:
>
>> Hi Sean,
>> I now have each machine running in pseudo-distributed mode but when I try
>> to run in distributed mode, I get an exception saying that there are 0
>> datanodes running. Any suggestions? I've modified core-site.xml to reflect
>> what the cluster is supposed to look like.
>>
>> -David
>>
>>
>> On Wed, Jul 16, 2014 at 1:17 PM, Sean Mackrory <[email protected]>
>> wrote:
>>
>>> It might be easiest to get it working on a single node, and then once
>>> you're familiar with the Bigtop packages and related files try on a
>>> cluster. On a single node, you can do "yum install hadoop-conf-pseudo",
>>> then format the namenode with "service hadoop-hdfs-namenode init", and then
>>> start all of Hadoop: "for service in hadoop-hdfs-namenode
>>> hadoop-hdfs-secondarynamenode hadoop-hdfs-datanode
>>> hadoop-yarn-resourcemanager hadoop-yarn-nodemanager; do service $service
>>> start; done". That should give you an idea of how Bigtop deploys stuff and
>>> what packages you need. hadoop-conf-pseudo will install all the packages
>>> that provide the init scripts and libraries required for every role, and a
>>> working single-node configuration. You would want to install those roles on
>>> different machines (e.g. NameNode and ResourceManager on one, DataNode and
>>> NodeManager on all the others), and then edit the configuration files in
>>> /etc/hadoop/conf on each node accordingly so the datanodes know which
>>> namenode to connect to, etc.
>>>
>>>
>>> On Wed, Jul 16, 2014 at 10:56 AM, Mark Grover <
>>> [email protected]> wrote:
>>>
>>>> The 'hadoop' package just delivers the hadoop common bits but no init
>>>> scripts to start the service, no convenience artifacts that deploy
>>>> configuration for say, starting hadoop pseudo distributed cluster. For all
>>>> practical purposes, you are going to need hadoop-hdfs and hadoop-mapreduce
>>>> packages which deliver bits for HDFS and MR. However, even that may not be
>>>> enough, you likely need init scripts to be installed for starting and
>>>> stopping services related to HDFS and MR. So, depending on if you are
>>>> installing Hadoop on a fully-distributed cluster or a pseudo-distributed
>>>> cluster, you may need to install one or more services (and hence packages)
>>>> like resource manager, node manager, namenode and datanode on the node(s).
>>>> Then, you will have to deploy the configuration yourself. We have default
>>>> configuration installed by packages but you definitely need to add some
>>>> entries to make it work for a fully-distributed cluster e.g. adding the
>>>> name of the namenode host to configuration of datanodes. If you are using
>>>> just a pseudo-distributed, you can installed the pseudo distributed
>>>> configuration package (which has all the necessary dependencies so
>>>> installing that nothing else should be good) and you will get an
>>>> out-of-the-box experience.
>>>>
>>>> FYI, if you do
>>>> yum list 'hadoop*'
>>>> You would find a list of all hadoop related packages that are available
>>>> to be installed.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 16, 2014 at 9:39 AM, David Fryer <[email protected]>
>>>> wrote:
>>>>
>>>>> Is it necessary to install the whole hadoop stack?
>>>>>
>>>>>
>>>>> On Wed, Jul 16, 2014 at 12:37 PM, David Fryer <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The only output from that is:
>>>>>> hadoop-2.0.5.1-1.el6.x86_64
>>>>>>
>>>>>> -David
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 16, 2014 at 12:34 PM, Mark Grover <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Possibly, can you check what packages you have installed related to
>>>>>>> hadoop.
>>>>>>>
>>>>>>> rpm -qa | grep hadoop
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 16, 2014 at 9:28 AM, David Fryer <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mark,
>>>>>>>> I'm trying to follow those instructions on a CentOS 6 machine, and
>>>>>>>> after running "yum install hadoop\*", I can't find anything related to
>>>>>>>> hadoop in /etc/init.d. Is there something I'm missing?
>>>>>>>>
>>>>>>>> -David
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 16, 2014 at 11:34 AM, Mark Grover <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Welcome, David.
>>>>>>>>>
>>>>>>>>> For physical machines, I personally always use instructions like
>>>>>>>>> these:
>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0
>>>>>>>>>
>>>>>>>>> These for Bigtop 0.6.0, the latest Bigtop release is 0.7.0 but we
>>>>>>>>> don't have a page for that unfortunately (we should and if you could 
>>>>>>>>> help
>>>>>>>>> with that, that'd be much appreciated!). We are tying up lose ends for
>>>>>>>>> Bigtop 0.8, so we hope to release it soon.
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 16, 2014 at 8:20 AM, jay vyas <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> one more note : by "look at the csv file" above i meant, "edit it
>>>>>>>>>> so that it reflects your
>>>>>>>>>> environment".
>>>>>>>>>>
>>>>>>>>>> Make sure and read  the puppet README file as well under
>>>>>>>>>> bigtop-deploy/puppet.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 16, 2014 at 11:15 AM, jay vyas <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi david .
>>>>>>>>>>>
>>>>>>>>>>> Glad to hear the vagrant stuff worked for you.  Now , the next
>>>>>>>>>>> step will be to port it to bare metal, like you say.
>>>>>>>>>>>
>>>>>>>>>>> The Vagrantfile does two things
>>>>>>>>>>>
>>>>>>>>>>> 1) It creates a shared folder for all machines.
>>>>>>>>>>> 2) It spins up centos boxes .
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So in the "real world" you will need to obviously set up ssh
>>>>>>>>>>> between machines to start.
>>>>>>>>>>> After that , roughly, will need to do the following:
>>>>>>>>>>>
>>>>>>>>>>> - clone bigtop onto each of your  machines
>>>>>>>>>>> - install puppet 2.x on each of the machines
>>>>>>>>>>> - look at the csv file created in the vagrant provisioner, and
>>>>>>>>>>> read the puppet README file (in bigtop-deploy)
>>>>>>>>>>> - run puppet apply on the head node
>>>>>>>>>>> Once that works
>>>>>>>>>>> - run puppet apply on each slave.
>>>>>>>>>>> now on any node that you use as client, (i just use the master
>>>>>>>>>>> usually) you can yum install your favorite ecosystem components:
>>>>>>>>>>> yum install -y pig mahout
>>>>>>>>>>>
>>>>>>>>>>> And you have a working hadoop cluster.
>>>>>>>>>>>
>>>>>>>>>>> one idea as I know your on the east coast, if your company is
>>>>>>>>>>> interested in hosting/sponsoring a bigtop meetup, we could possibly 
>>>>>>>>>>> bring
>>>>>>>>>>> some folks from the boston / nyc area together to walk through 
>>>>>>>>>>> building a
>>>>>>>>>>> bigtop cluster on bare metal.  Let us know if any other questions.  
>>>>>>>>>>>  These
>>>>>>>>>>> directions are admittedly a little bit rough.
>>>>>>>>>>>
>>>>>>>>>>> Also, once you get this working, you can help us to update the
>>>>>>>>>>> wiki pages.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 16, 2014 at 10:39 AM, David Fryer <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Bigtop!
>>>>>>>>>>>>
>>>>>>>>>>>> I'm looking to use bigtop to help set up a small hadoop
>>>>>>>>>>>> cluster. I'm currently messing about with the hadoop tarball and 
>>>>>>>>>>>> all of the
>>>>>>>>>>>> associated xml files, and I don't really have the time or 
>>>>>>>>>>>> expertise to get
>>>>>>>>>>>> it up and working.
>>>>>>>>>>>>
>>>>>>>>>>>> Jay suggested that bigtop may be a good solution, so I've
>>>>>>>>>>>> decided to give it a shot. Unfortunately, documentation is fairly 
>>>>>>>>>>>> sparse
>>>>>>>>>>>> and I'm not quite sure where to start. I've cloned the github repo 
>>>>>>>>>>>> and used
>>>>>>>>>>>> the startup.sh script found in 
>>>>>>>>>>>> bigtop/bigtop-deploy/vm/vagrant-puppet to
>>>>>>>>>>>> set up a virtual cluster, but I am unsure how to apply this to 
>>>>>>>>>>>> physical
>>>>>>>>>>>> machines. I'm also not quite sure how to get hadoop and hdfs up 
>>>>>>>>>>>> and working.
>>>>>>>>>>>>
>>>>>>>>>>>> Any help would be appreciated!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David Fryer
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> jay vyas
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> jay vyas
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New to Bigtop, where to start?

Reply via email to