Re: New to Bigtop, where to start?

David Fryer Wed, 16 Jul 2014 13:36:13 -0700

Hi Sean,
I now have each machine running in pseudo-distributed mode but when I try
to run in distributed mode, I get an exception saying that there are 0
datanodes running. Any suggestions? I've modified core-site.xml to reflect
what the cluster is supposed to look like.


-David


On Wed, Jul 16, 2014 at 1:17 PM, Sean Mackrory <[email protected]> wrote:

> It might be easiest to get it working on a single node, and then once
> you're familiar with the Bigtop packages and related files try on a
> cluster. On a single node, you can do "yum install hadoop-conf-pseudo",
> then format the namenode with "service hadoop-hdfs-namenode init", and then
> start all of Hadoop: "for service in hadoop-hdfs-namenode
> hadoop-hdfs-secondarynamenode hadoop-hdfs-datanode
> hadoop-yarn-resourcemanager hadoop-yarn-nodemanager; do service $service
> start; done". That should give you an idea of how Bigtop deploys stuff and
> what packages you need. hadoop-conf-pseudo will install all the packages
> that provide the init scripts and libraries required for every role, and a
> working single-node configuration. You would want to install those roles on
> different machines (e.g. NameNode and ResourceManager on one, DataNode and
> NodeManager on all the others), and then edit the configuration files in
> /etc/hadoop/conf on each node accordingly so the datanodes know which
> namenode to connect to, etc.
>
>
> On Wed, Jul 16, 2014 at 10:56 AM, Mark Grover <[email protected]
> > wrote:
>
>> The 'hadoop' package just delivers the hadoop common bits but no init
>> scripts to start the service, no convenience artifacts that deploy
>> configuration for say, starting hadoop pseudo distributed cluster. For all
>> practical purposes, you are going to need hadoop-hdfs and hadoop-mapreduce
>> packages which deliver bits for HDFS and MR. However, even that may not be
>> enough, you likely need init scripts to be installed for starting and
>> stopping services related to HDFS and MR. So, depending on if you are
>> installing Hadoop on a fully-distributed cluster or a pseudo-distributed
>> cluster, you may need to install one or more services (and hence packages)
>> like resource manager, node manager, namenode and datanode on the node(s).
>> Then, you will have to deploy the configuration yourself. We have default
>> configuration installed by packages but you definitely need to add some
>> entries to make it work for a fully-distributed cluster e.g. adding the
>> name of the namenode host to configuration of datanodes. If you are using
>> just a pseudo-distributed, you can installed the pseudo distributed
>> configuration package (which has all the necessary dependencies so
>> installing that nothing else should be good) and you will get an
>> out-of-the-box experience.
>>
>> FYI, if you do
>> yum list 'hadoop*'
>> You would find a list of all hadoop related packages that are available
>> to be installed.
>>
>>
>>
>> On Wed, Jul 16, 2014 at 9:39 AM, David Fryer <[email protected]>
>> wrote:
>>
>>> Is it necessary to install the whole hadoop stack?
>>>
>>>
>>> On Wed, Jul 16, 2014 at 12:37 PM, David Fryer <[email protected]>
>>> wrote:
>>>
>>>> The only output from that is:
>>>> hadoop-2.0.5.1-1.el6.x86_64
>>>>
>>>> -David
>>>>
>>>>
>>>> On Wed, Jul 16, 2014 at 12:34 PM, Mark Grover <[email protected]> wrote:
>>>>
>>>>> Possibly, can you check what packages you have installed related to
>>>>> hadoop.
>>>>>
>>>>> rpm -qa | grep hadoop
>>>>>
>>>>>
>>>>> On Wed, Jul 16, 2014 at 9:28 AM, David Fryer <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>> I'm trying to follow those instructions on a CentOS 6 machine, and
>>>>>> after running "yum install hadoop\*", I can't find anything related to
>>>>>> hadoop in /etc/init.d. Is there something I'm missing?
>>>>>>
>>>>>> -David
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 16, 2014 at 11:34 AM, Mark Grover <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Welcome, David.
>>>>>>>
>>>>>>> For physical machines, I personally always use instructions like
>>>>>>> these:
>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0
>>>>>>>
>>>>>>> These for Bigtop 0.6.0, the latest Bigtop release is 0.7.0 but we
>>>>>>> don't have a page for that unfortunately (we should and if you could 
>>>>>>> help
>>>>>>> with that, that'd be much appreciated!). We are tying up lose ends for
>>>>>>> Bigtop 0.8, so we hope to release it soon.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 16, 2014 at 8:20 AM, jay vyas <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> one more note : by "look at the csv file" above i meant, "edit it
>>>>>>>> so that it reflects your
>>>>>>>> environment".
>>>>>>>>
>>>>>>>> Make sure and read  the puppet README file as well under
>>>>>>>> bigtop-deploy/puppet.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 16, 2014 at 11:15 AM, jay vyas <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi david .
>>>>>>>>>
>>>>>>>>> Glad to hear the vagrant stuff worked for you.  Now , the next
>>>>>>>>> step will be to port it to bare metal, like you say.
>>>>>>>>>
>>>>>>>>> The Vagrantfile does two things
>>>>>>>>>
>>>>>>>>> 1) It creates a shared folder for all machines.
>>>>>>>>> 2) It spins up centos boxes .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So in the "real world" you will need to obviously set up ssh
>>>>>>>>> between machines to start.
>>>>>>>>> After that , roughly, will need to do the following:
>>>>>>>>>
>>>>>>>>> - clone bigtop onto each of your  machines
>>>>>>>>> - install puppet 2.x on each of the machines
>>>>>>>>> - look at the csv file created in the vagrant provisioner, and
>>>>>>>>> read the puppet README file (in bigtop-deploy)
>>>>>>>>> - run puppet apply on the head node
>>>>>>>>> Once that works
>>>>>>>>> - run puppet apply on each slave.
>>>>>>>>> now on any node that you use as client, (i just use the master
>>>>>>>>> usually) you can yum install your favorite ecosystem components:
>>>>>>>>> yum install -y pig mahout
>>>>>>>>>
>>>>>>>>> And you have a working hadoop cluster.
>>>>>>>>>
>>>>>>>>> one idea as I know your on the east coast, if your company is
>>>>>>>>> interested in hosting/sponsoring a bigtop meetup, we could possibly 
>>>>>>>>> bring
>>>>>>>>> some folks from the boston / nyc area together to walk through 
>>>>>>>>> building a
>>>>>>>>> bigtop cluster on bare metal.  Let us know if any other questions.   
>>>>>>>>> These
>>>>>>>>> directions are admittedly a little bit rough.
>>>>>>>>>
>>>>>>>>> Also, once you get this working, you can help us to update the
>>>>>>>>> wiki pages.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 16, 2014 at 10:39 AM, David Fryer <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Bigtop!
>>>>>>>>>>
>>>>>>>>>> I'm looking to use bigtop to help set up a small hadoop cluster.
>>>>>>>>>> I'm currently messing about with the hadoop tarball and all of the
>>>>>>>>>> associated xml files, and I don't really have the time or expertise 
>>>>>>>>>> to get
>>>>>>>>>> it up and working.
>>>>>>>>>>
>>>>>>>>>> Jay suggested that bigtop may be a good solution, so I've decided
>>>>>>>>>> to give it a shot. Unfortunately, documentation is fairly sparse and 
>>>>>>>>>> I'm
>>>>>>>>>> not quite sure where to start. I've cloned the github repo and used 
>>>>>>>>>> the
>>>>>>>>>> startup.sh script found in bigtop/bigtop-deploy/vm/vagrant-puppet to 
>>>>>>>>>> set up
>>>>>>>>>> a virtual cluster, but I am unsure how to apply this to physical 
>>>>>>>>>> machines.
>>>>>>>>>> I'm also not quite sure how to get hadoop and hdfs up and working.
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David Fryer
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> jay vyas
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> jay vyas
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New to Bigtop, where to start?

Reply via email to