Re: cluster set-up / a few quick questions

Andy Isaacson Fri, 26 Oct 2012 14:33:04 -0700

On Fri, Oct 26, 2012 at 11:47 AM, Kartashov, Andy
<[email protected]> wrote:
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and 
> are now trying to try fully-dist'ed one.
>
> a. I created another instance foo2 on EC2.


It seems like you're trying to use the start-dfs.sh style startup
scripts to manually run a cluster on EC2.  This is doable, but it's
not very easy due to the mismatch in expectations between EC2 style
deployments and start-dfs.sh.  Setting up a manually started cluster
requires a bit of up-front work, and EC2 spin-up/spin-down cycles mean
you end up redoing that work frequently.

You might consider using whirr, http://whirr.apache.org/ as a more
automated way of deploying Hadoop clusters on EC2.

Of course, setting up a manual cluster can be a really good way to
understand how all the parts work together, and doing it on EC2 should
work just fine.

> Installed hadoop on it and copied conf/  folder from foo1 to foo2. I created  
> /hadoop/dfs/data folder on the local linux system on foo2.
>
> b. on foo1 I created file conf/slaves and added:
> localhost
> <hostname-of-foo2>

I'd strongly recommend being consistent with the naming, don't mix
"localhost" and DNS names. EC2 has "ec2.internal" in /etc/resolv.conf
by default, so you can "ping ip-10-42-120-3" and it should work just
fine. Then make conf/master list your first host by name, and make
conf/slaves list all your hosts by name. Note that for small clusters,
running a DN and a NN on a single host is an acceptable compromise and
works OK.

% cat conf/master
ip-10-42-120-3
% cat conf/slaves
ip-10-42-120-3
ip-10-42-115-32
%

You also should make sure that your user account can ssh to all the nodes:
% for h in $(cat conf/slaves); do ssh -oStrictHostKeyChecking=no $h
hostname; done

 - answer "yes" to any "allow untrusted certificate" messages
 - if you get "permission denied" messages you'll need to set up the
authorized_keys properly.
 - after this loop succeeds you should be able to run it again and get
a clean list of hostnames.

> At this point I cannot find an answer on what to do next.
>
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar 
> -files -blocks -locations", it showed # of datanode as 1.  I was expecting DN 
> and TT on foo2 to be started by foo1. But it didn’t happen, so I started them 
> myself and tried the the command again. Still  one DD.

You don't need to start the daemons individually, and doing so is very
difficult to get right. I virtually never do so -- I use the
start-dfs.sh script to start the daemons (NN, DN, TT, etc). The
"master" and "slaves" config files are parsed by the start-*.sh
scripts, not by the daemons themselves.  And, the daemons don't start
themselves -- for a manual cluster, the start-*.sh scripts are
responsible. (In a production deployment such as CDH, there is a
/etc/init.d script which is managed by the distro packaging to start
and manage the daemons.)

-andy

Re: cluster set-up / a few quick questions

Reply via email to