Andy, many thanks.

I am stuck here now so please put me in the right direction.

I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and 
are now trying to try fully-dist'ed one.

a. I created another instance foo2 on EC2. Installed hadoop on it and copied 
conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folder on the 
local linux system on foo2.

b. on foo1 I created file conf/slaves and added:
localhost
<hostname-of-foo2>

At this point I cannot find an answer on what to do next.

I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar 
-files -blocks -locations", it showed # of datanode as 1.  I was expecting DN 
and TT on foo2 to be started by foo1. But it didn’t happen, so I started them 
myself and tried the the command again. Still  one DD.
I realise that boo2 has no data at this point but I could not find 
bin/start-balancer.sh script to help me to balance data over to DD from foo1 to 
foo2.

What do I do next?

Thanks
AK

-----Original Message-----
From: Andy Isaacson [mailto:[email protected]]
Sent: Friday, October 26, 2012 2:21 PM
To: [email protected]
Subject: Re: cluster set-up / a few quick questions

On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[email protected]> wrote:
> Gents,

We're not all male here. :)  I prefer "Hadoopers" or "hi all,".

> 1.
> - do you put Master's node <hostname> under fs.default.name in core-site.xml 
> on the slave machines or slaves' hostnames?

Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name is 
hdfs://foo1.domain.com.

> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp 
> /var folders on the HDFS of the slave machines that will be running only DN 
> and TT or not? Do you still need to create hadoop/dfs/name folder on the 
> slaves?

(The following is the simple answer, for non-HA non-federated HDFS.
You'll want to get the simple example working before trying the complicated 
ones.)

No. A cluster has one namenode, running on the machine known as the master, and 
the admin must "hadoop namenode -format" on that machine only.

In my example, I ran "hadoop namenode -format" on foo1.

> 2.
> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  
> /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by 
> running command "mkdir -p /hadoop/dfs/data"
> but mapred.system.dir  property is to point to HDFS and not NFS  since we are 
> running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
> If so and since it is exactly the same format  /far/boo/baz how does hadoop 
> know which directory is local on NFS or HDFS?

This is very confusing, to be sure!  There are a few places where paths are 
implicitly known to be on HDFS rather than a Linux filesystem path. 
mapred.system.dir is one of those. This does mean that given a string that 
starts with "/tmp/" you can't necessarily know whether it's a Linux path or a 
HDFS path without looking at the larger context.

In the case of mapred.system.dir, the docs are the place to check; according to 
cluster_setup.html, mapred.system.dir is "Path on the HDFS where where the 
Map/Reduce framework stores system files".

http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html

Hope this helps,
-andy
NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont 
confidentiels, protégés par le droit d'auteur et peuvent être couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autorisée est 
interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, 
supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à 
l'environnement avant d'imprimer le présent courriel

Reply via email to