Re: cluster set-up / a few quick questions - SOLVED

Nitin Pawar Fri, 26 Oct 2012 22:41:06 -0700

Hi Andy,

you should definitely give a try to whirr for hadoop on aws. It solves
all issues and works smoothly.


Thanks,
nitin

On Sat, Oct 27, 2012 at 1:25 AM, Kartashov, Andy <[email protected]> wrote:
> Hadoopers,
>
> The problem was in EC2 security.  While I could passwordlessly ssh into 
> another node and back I could not telnet to it due to EC2 firewall.  Needed 
> to open ports for the NN and JT.  :)
>
> Now I can see 2  DNs running "hadoop fsck "  and can also -ls into NN from 
> the slave. Sweet!!!
>
> Is this possible to balance data over DNs without copying them with  hadoop 
> -put command? I read about bin/start-balancer.sh somewhere but cannot find it 
> on my current hadoop installation.
> Besides, is balancing data over DN going to improve perfomance of MR job?
>
> Cheers,
> Happy Hadooping.
>
> -----Original Message-----
> From: Nitin Pawar [mailto:[email protected]]
> Sent: Friday, October 26, 2012 3:18 PM
> To: [email protected]
> Subject: Re: cluster set-up / a few quick questions
>
> questions
>
> 1) Have you setup password less ssh between both hosts for the user who owns 
> the hadoop processes (or root)
> 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
> 3) If you started them one by one, there is no reason running a command on 
> one node will execute it on other.
>
>
> On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <[email protected]> 
> wrote:
>> Andy, many thanks.
>>
>> I am stuck here now so please put me in the right direction.
>>
>> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and 
>> are now trying to try fully-dist'ed one.
>>
>> a. I created another instance foo2 on EC2. Installed hadoop on it and copied 
>> conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folder on the 
>> local linux system on foo2.
>>
>> b. on foo1 I created file conf/slaves and added:
>> localhost
>> <hostname-of-foo2>
>>
>> At this point I cannot find an answer on what to do next.
>>
>> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar 
>> -files -blocks -locations", it showed # of datanode as 1.  I was expecting 
>> DN and TT on foo2 to be started by foo1. But it didn't happen, so I started 
>> them myself and tried the the command again. Still  one DD.
>> I realise that boo2 has no data at this point but I could not find 
>> bin/start-balancer.sh script to help me to balance data over to DD from foo1 
>> to foo2.
>>
>> What do I do next?
>>
>> Thanks
>> AK
>>
>> -----Original Message-----
>> From: Andy Isaacson [mailto:[email protected]]
>> Sent: Friday, October 26, 2012 2:21 PM
>> To: [email protected]
>> Subject: Re: cluster set-up / a few quick questions
>>
>> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[email protected]> 
>> wrote:
>>> Gents,
>>
>> We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
>>
>>> 1.
>>> - do you put Master's node <hostname> under fs.default.name in 
>>> core-site.xml on the slave machines or slaves' hostnames?
>>
>> Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name is 
>> hdfs://foo1.domain.com.
>>
>>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp 
>>> /var folders on the HDFS of the slave machines that will be running only DN 
>>> and TT or not? Do you still need to create hadoop/dfs/name folder on the 
>>> slaves?
>>
>> (The following is the simple answer, for non-HA non-federated HDFS.
>> You'll want to get the simple example working before trying the
>> complicated ones.)
>>
>> No. A cluster has one namenode, running on the machine known as the master, 
>> and the admin must "hadoop namenode -format" on that machine only.
>>
>> In my example, I ran "hadoop namenode -format" on foo1.
>>
>>> 2.
>>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  
>>> /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by 
>>> running command "mkdir -p /hadoop/dfs/data"
>>> but mapred.system.dir  property is to point to HDFS and not NFS  since we 
>>> are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
>>> If so and since it is exactly the same format  /far/boo/baz how does hadoop 
>>> know which directory is local on NFS or HDFS?
>>
>> This is very confusing, to be sure!  There are a few places where paths are 
>> implicitly known to be on HDFS rather than a Linux filesystem path. 
>> mapred.system.dir is one of those. This does mean that given a string that 
>> starts with "/tmp/" you can't necessarily know whether it's a Linux path or 
>> a HDFS path without looking at the larger context.
>>
>> In the case of mapred.system.dir, the docs are the place to check; according 
>> to cluster_setup.html, mapred.system.dir is "Path on the HDFS where where 
>> the Map/Reduce framework stores system files".
>>
>> http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html
>>
>> Hope this helps,
>> -andy
>> NOTICE: This e-mail message and any attachments are confidential,
>> subject to copyright and may be privileged. Any unauthorized use,
>> copying or disclosure is prohibited. If you are not the intended
>> recipient, please delete and contact the sender immediately. Please
>> consider the environment before printing this e-mail. AVIS : le
>> présent courriel et toute pièce jointe qui l'accompagne sont
>> confidentiels, protégés par le droit d'auteur et peuvent être couverts
>> par le secret professionnel. Toute utilisation, copie ou divulgation
>> non autorisée est interdite. Si vous n'êtes pas le destinataire prévu
>> de ce courriel, supprimez-le et contactez immédiatement l'expéditeur.
>> Veuillez penser à l'environnement avant d'imprimer le présent courriel
>
>
>
> --
> Nitin Pawar
> NOTICE: This e-mail message and any attachments are confidential, subject to 
> copyright and may be privileged. Any unauthorized use, copying or disclosure 
> is prohibited. If you are not the intended recipient, please delete and 
> contact the sender immediately. Please consider the environment before 
> printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui 
> l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent 
> être couverts par le secret professionnel. Toute utilisation, copie ou 
> divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire 
> prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. 
> Veuillez penser à l'environnement avant d'imprimer le présent courriel



-- 
Nitin Pawar

Re: cluster set-up / a few quick questions - SOLVED

Reply via email to