Re: cluster set-up / a few quick questions

Nitin Pawar Fri, 26 Oct 2012 12:18:29 -0700

questions

1) Have you setup password less ssh between both hosts for the user
who owns the hadoop processes (or root)
2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
3) If you started them one by one, there is no reason running a
command on one node will execute it on other.



On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy
<[email protected]> wrote:
> Andy, many thanks.
>
> I am stuck here now so please put me in the right direction.
>
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and 
> are now trying to try fully-dist'ed one.
>
> a. I created another instance foo2 on EC2. Installed hadoop on it and copied 
> conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folder on the 
> local linux system on foo2.
>
> b. on foo1 I created file conf/slaves and added:
> localhost
> <hostname-of-foo2>
>
> At this point I cannot find an answer on what to do next.
>
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar 
> -files -blocks -locations", it showed # of datanode as 1.  I was expecting DN 
> and TT on foo2 to be started by foo1. But it didn’t happen, so I started them 
> myself and tried the the command again. Still  one DD.
> I realise that boo2 has no data at this point but I could not find 
> bin/start-balancer.sh script to help me to balance data over to DD from foo1 
> to foo2.
>
> What do I do next?
>
> Thanks
> AK
>
> -----Original Message-----
> From: Andy Isaacson [mailto:[email protected]]
> Sent: Friday, October 26, 2012 2:21 PM
> To: [email protected]
> Subject: Re: cluster set-up / a few quick questions
>
> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[email protected]> 
> wrote:
>> Gents,
>
> We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
>
>> 1.
>> - do you put Master's node <hostname> under fs.default.name in core-site.xml 
>> on the slave machines or slaves' hostnames?
>
> Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name is 
> hdfs://foo1.domain.com.
>
>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp 
>> /var folders on the HDFS of the slave machines that will be running only DN 
>> and TT or not? Do you still need to create hadoop/dfs/name folder on the 
>> slaves?
>
> (The following is the simple answer, for non-HA non-federated HDFS.
> You'll want to get the simple example working before trying the complicated 
> ones.)
>
> No. A cluster has one namenode, running on the machine known as the master, 
> and the admin must "hadoop namenode -format" on that machine only.
>
> In my example, I ran "hadoop namenode -format" on foo1.
>
>> 2.
>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  
>> /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by 
>> running command "mkdir -p /hadoop/dfs/data"
>> but mapred.system.dir  property is to point to HDFS and not NFS  since we 
>> are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
>> If so and since it is exactly the same format  /far/boo/baz how does hadoop 
>> know which directory is local on NFS or HDFS?
>
> This is very confusing, to be sure!  There are a few places where paths are 
> implicitly known to be on HDFS rather than a Linux filesystem path. 
> mapred.system.dir is one of those. This does mean that given a string that 
> starts with "/tmp/" you can't necessarily know whether it's a Linux path or a 
> HDFS path without looking at the larger context.
>
> In the case of mapred.system.dir, the docs are the place to check; according 
> to cluster_setup.html, mapred.system.dir is "Path on the HDFS where where the 
> Map/Reduce framework stores system files".
>
> http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html
>
> Hope this helps,
> -andy
> NOTICE: This e-mail message and any attachments are confidential, subject to 
> copyright and may be privileged. Any unauthorized use, copying or disclosure 
> is prohibited. If you are not the intended recipient, please delete and 
> contact the sender immediately. Please consider the environment before 
> printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui 
> l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent 
> être couverts par le secret professionnel. Toute utilisation, copie ou 
> divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire 
> prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. 
> Veuillez penser à l'environnement avant d'imprimer le présent courriel



-- 
Nitin Pawar

Re: cluster set-up / a few quick questions

Reply via email to