Re: cluster set-up / a few quick questions - SOLVED

Nitin Pawar Fri, 02 Nov 2012 01:11:49 -0700

you can get the script from hadoop codebase at
http://svn.apache.org/viewcvs.cgi/hadoop/common<http://svn.apache.org/viewcvs.cgi/hadoop/common/trunk/>



On Fri, Nov 2, 2012 at 12:41 AM, Kartashov, Andy <[email protected]>wrote:

> People,
>
> While I did not find start-balancer.sh script on my machine I successfully
> utilized the following command:
>
> "$hadoop balancer -threshold 10" and achieved  the exact same result.
>
> One issue remains. Controlling start/stop  daemons of the slaves through
> the master. Somehow I don't have dfs-start/stop.sh nor dfs-start-all.sh
> script on my machine either.  For now, I am starting  dfs and mapreduce
> daemons on each slave manually and individually.
>
> Can someone post the content of the script star-all.sh so I could utilize
> it for my environment.
>
> Thanks,
> AK47
>
>
> -----Original Message-----
> From: Kartashov, Andy
> Sent: Friday, October 26, 2012 3:56 PM
> To: [email protected]
> Subject: RE: cluster set-up / a few quick questions - SOLVED
>
> Hadoopers,
>
> The problem was in EC2 security.  While I could passwordlessly ssh into
> another node and back I could not telnet to it due to EC2 firewall.  Needed
> to open ports for the NN and JT.  :)
>
> Now I can see 2  DNs running "hadoop fsck "  and can also -ls into NN from
> the slave. Sweet!!!
>
> Is this possible to balance data over DNs without copying them with
>  hadoop -put command? I read about bin/start-balancer.sh somewhere but
> cannot find it on my current hadoop installation.
> Besides, is balancing data over DN going to improve perfomance of MR job?
>
> Cheers,
> Happy Hadooping.
>
> -----Original Message-----
> From: Nitin Pawar [mailto:[email protected]]
> Sent: Friday, October 26, 2012 3:18 PM
> To: [email protected]
> Subject: Re: cluster set-up / a few quick questions
>
> questions
>
> 1) Have you setup password less ssh between both hosts for the user who
> owns the hadoop processes (or root)
> 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
> 3) If you started them one by one, there is no reason running a command on
> one node will execute it on other.
>
>
> On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <[email protected]>
> wrote:
> > Andy, many thanks.
> >
> > I am stuck here now so please put me in the right direction.
> >
> > I successfully ran a job on a cluster on foo1 in pseudo-distributed mode
> and are now trying to try fully-dist'ed one.
> >
> > a. I created another instance foo2 on EC2. Installed hadoop on it and
> copied conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folder
> on the local linux system on foo2.
> >
> > b. on foo1 I created file conf/slaves and added:
> > localhost
> > <hostname-of-foo2>
> >
> > At this point I cannot find an answer on what to do next.
> >
> > I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck
> /user/bar -files -blocks -locations", it showed # of datanode as 1.  I was
> expecting DN and TT on foo2 to be started by foo1. But it didn't happen, so
> I started them myself and tried the the command again. Still  one DD.
> > I realise that boo2 has no data at this point but I could not find
> bin/start-balancer.sh script to help me to balance data over to DD from
> foo1 to foo2.
> >
> > What do I do next?
> >
> > Thanks
> > AK
> >
> > -----Original Message-----
> > From: Andy Isaacson [mailto:[email protected]]
> > Sent: Friday, October 26, 2012 2:21 PM
> > To: [email protected]
> > Subject: Re: cluster set-up / a few quick questions
> >
> > On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[email protected]>
> wrote:
> >> Gents,
> >
> > We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
> >
> >> 1.
> >> - do you put Master's node <hostname> under fs.default.name in
> core-site.xml on the slave machines or slaves' hostnames?
> >
> > Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.nameis 
> > hdfs://
> foo1.domain.com.
> >
> >> - do you need to run "sudo -u hdfs hadoop namenode -format" and create
> /tmp /var folders on the HDFS of the slave machines that will be running
> only DN and TT or not? Do you still need to create hadoop/dfs/name folder
> on the slaves?
> >
> > (The following is the simple answer, for non-HA non-federated HDFS.
> > You'll want to get the simple example working before trying the
> > complicated ones.)
> >
> > No. A cluster has one namenode, running on the machine known as the
> master, and the admin must "hadoop namenode -format" on that machine only.
> >
> > In my example, I ran "hadoop namenode -format" on foo1.
> >
> >> 2.
> >> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify
>  /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by
> running command "mkdir -p /hadoop/dfs/data"
> >> but mapred.system.dir  property is to point to HDFS and not NFS  since
> we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
> >> If so and since it is exactly the same format  /far/boo/baz how does
> hadoop know which directory is local on NFS or HDFS?
> >
> > This is very confusing, to be sure!  There are a few places where paths
> are implicitly known to be on HDFS rather than a Linux filesystem path.
> mapred.system.dir is one of those. This does mean that given a string that
> starts with "/tmp/" you can't necessarily know whether it's a Linux path or
> a HDFS path without looking at the larger context.
> >
> > In the case of mapred.system.dir, the docs are the place to check;
> according to cluster_setup.html, mapred.system.dir is "Path on the HDFS
> where where the Map/Reduce framework stores system files".
> >
> > http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html
> >
> > Hope this helps,
> > -andy
> > NOTICE: This e-mail message and any attachments are confidential,
> > subject to copyright and may be privileged. Any unauthorized use,
> > copying or disclosure is prohibited. If you are not the intended
> > recipient, please delete and contact the sender immediately. Please
> > consider the environment before printing this e-mail. AVIS : le
> > présent courriel et toute pièce jointe qui l'accompagne sont
> > confidentiels, protégés par le droit d'auteur et peuvent être couverts
> > par le secret professionnel. Toute utilisation, copie ou divulgation
> > non autorisée est interdite. Si vous n'êtes pas le destinataire prévu
> > de ce courriel, supprimez-le et contactez immédiatement l'expéditeur.
> > Veuillez penser à l'environnement avant d'imprimer le présent courriel
>
>
>
> --
> Nitin Pawar
> NOTICE: This e-mail message and any attachments are confidential, subject
> to copyright and may be privileged. Any unauthorized use, copying or
> disclosure is prohibited. If you are not the intended recipient, please
> delete and contact the sender immediately. Please consider the environment
> before printing this e-mail. AVIS : le présent courriel et toute pièce
> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
> et peuvent être couverts par le secret professionnel. Toute utilisation,
> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
> courriel
>



-- 
Nitin Pawar

Re: cluster set-up / a few quick questions - SOLVED

Reply via email to