Hi Andy, you should definitely give a try to whirr for hadoop on aws. It solves all issues and works smoothly.
Thanks, nitin On Sat, Oct 27, 2012 at 1:25 AM, Kartashov, Andy <[email protected]> wrote: > Hadoopers, > > The problem was in EC2 security. While I could passwordlessly ssh into > another node and back I could not telnet to it due to EC2 firewall. Needed > to open ports for the NN and JT. :) > > Now I can see 2 DNs running "hadoop fsck " and can also -ls into NN from > the slave. Sweet!!! > > Is this possible to balance data over DNs without copying them with hadoop > -put command? I read about bin/start-balancer.sh somewhere but cannot find it > on my current hadoop installation. > Besides, is balancing data over DN going to improve perfomance of MR job? > > Cheers, > Happy Hadooping. > > -----Original Message----- > From: Nitin Pawar [mailto:[email protected]] > Sent: Friday, October 26, 2012 3:18 PM > To: [email protected] > Subject: Re: cluster set-up / a few quick questions > > questions > > 1) Have you setup password less ssh between both hosts for the user who owns > the hadoop processes (or root) > 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT > 3) If you started them one by one, there is no reason running a command on > one node will execute it on other. > > > On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <[email protected]> > wrote: >> Andy, many thanks. >> >> I am stuck here now so please put me in the right direction. >> >> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and >> are now trying to try fully-dist'ed one. >> >> a. I created another instance foo2 on EC2. Installed hadoop on it and copied >> conf/ folder from foo1 to foo2. I created /hadoop/dfs/data folder on the >> local linux system on foo2. >> >> b. on foo1 I created file conf/slaves and added: >> localhost >> <hostname-of-foo2> >> >> At this point I cannot find an answer on what to do next. >> >> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar >> -files -blocks -locations", it showed # of datanode as 1. I was expecting >> DN and TT on foo2 to be started by foo1. But it didn't happen, so I started >> them myself and tried the the command again. Still one DD. >> I realise that boo2 has no data at this point but I could not find >> bin/start-balancer.sh script to help me to balance data over to DD from foo1 >> to foo2. >> >> What do I do next? >> >> Thanks >> AK >> >> -----Original Message----- >> From: Andy Isaacson [mailto:[email protected]] >> Sent: Friday, October 26, 2012 2:21 PM >> To: [email protected] >> Subject: Re: cluster set-up / a few quick questions >> >> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[email protected]> >> wrote: >>> Gents, >> >> We're not all male here. :) I prefer "Hadoopers" or "hi all,". >> >>> 1. >>> - do you put Master's node <hostname> under fs.default.name in >>> core-site.xml on the slave machines or slaves' hostnames? >> >> Master. I have a 4-node cluster, named foo1 - foo4. My fs.default.name is >> hdfs://foo1.domain.com. >> >>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp >>> /var folders on the HDFS of the slave machines that will be running only DN >>> and TT or not? Do you still need to create hadoop/dfs/name folder on the >>> slaves? >> >> (The following is the simple answer, for non-HA non-federated HDFS. >> You'll want to get the simple example working before trying the >> complicated ones.) >> >> No. A cluster has one namenode, running on the machine known as the master, >> and the admin must "hadoop namenode -format" on that machine only. >> >> In my example, I ran "hadoop namenode -format" on foo1. >> >>> 2. >>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties we specify >>> /hadoop/dfs/name /hadoop/dfs/data being local linux NFS directories by >>> running command "mkdir -p /hadoop/dfs/data" >>> but mapred.system.dir property is to point to HDFS and not NFS since we >>> are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"?? >>> If so and since it is exactly the same format /far/boo/baz how does hadoop >>> know which directory is local on NFS or HDFS? >> >> This is very confusing, to be sure! There are a few places where paths are >> implicitly known to be on HDFS rather than a Linux filesystem path. >> mapred.system.dir is one of those. This does mean that given a string that >> starts with "/tmp/" you can't necessarily know whether it's a Linux path or >> a HDFS path without looking at the larger context. >> >> In the case of mapred.system.dir, the docs are the place to check; according >> to cluster_setup.html, mapred.system.dir is "Path on the HDFS where where >> the Map/Reduce framework stores system files". >> >> http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html >> >> Hope this helps, >> -andy >> NOTICE: This e-mail message and any attachments are confidential, >> subject to copyright and may be privileged. Any unauthorized use, >> copying or disclosure is prohibited. If you are not the intended >> recipient, please delete and contact the sender immediately. Please >> consider the environment before printing this e-mail. AVIS : le >> présent courriel et toute pièce jointe qui l'accompagne sont >> confidentiels, protégés par le droit d'auteur et peuvent être couverts >> par le secret professionnel. Toute utilisation, copie ou divulgation >> non autorisée est interdite. Si vous n'êtes pas le destinataire prévu >> de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. >> Veuillez penser à l'environnement avant d'imprimer le présent courriel > > > > -- > Nitin Pawar > NOTICE: This e-mail message and any attachments are confidential, subject to > copyright and may be privileged. Any unauthorized use, copying or disclosure > is prohibited. If you are not the intended recipient, please delete and > contact the sender immediately. Please consider the environment before > printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui > l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent > être couverts par le secret professionnel. Toute utilisation, copie ou > divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire > prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. > Veuillez penser à l'environnement avant d'imprimer le présent courriel -- Nitin Pawar
