We had a similar issue on EC2, though our cluster did start just fine.
 Whats even better is even if you configure youre xml files to use the
private IP address, hadoop will resolve this to the public DNS and use that
for all calls between nodes in the cluster (yay!?!).

What we ended up having to do was write a post whirr job that logged onto
each machine in the hadoop cluster, and add host file entries for the
public dns of each node in the cluster.  The reason we came to this is
without modifying the host file, we would eventually get a lot of timeout
exceptions, eventually killing one of our hadoop jobs.  Adding the public
dns entries into the host files solved this timeout issue for us.

Thanks,
JohnC


On Wed, Jan 25, 2012 at 2:18 PM, Andrei Savu <[email protected]> wrote:

> Hi -
>
> And welcome to Apache Whirr! I have just tried a similar configuration
> file twice and the cluster starts as expected for me.
>
> Here is the part that was different:
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,1
> hadoop-datanode+hadoop-tasktracker
>
> *Important note:* namenode+jobtracker != jobtracker+namenode - we are
> starting the roles in order. Let me know if this makes a difference for
> you.
>
> Are you running Whirr inside Amazon? Is there anything special about your
> network / DNS setup?
>
> Regards,
>
> -- Andrei Savu / andreisavu.ro
>
> 2012/1/25 Fermín Galán Márquez <[email protected]>:
> > Hi,
> >
> > I've created a Hadoop cluster in EC2 using Whirr 0.7.0 (1
> > jobtracker+namenode, 5 datanode+tasktracker). Once the "launch-cluster"
> > process ends, datanode+tasktracker nodes seems ok, as they have the
> > corresponding Hadoop daemons up and running. However, the
> > namenode+jobtracker is not running the corresponding Hadoop daemons and,
> if
> > I try to run the namenode manually I get an error like this:
> >
> > root@ip-10-190-221-195:/usr/local/hadoop# bin/hadoop namenode
> > 12/01/25 20:55:06 INFO namenode.NameNode: STARTUP_MSG:
> > /************************************************************
> > STARTUP_MSG: Starting NameNode
> > STARTUP_MSG:   host = ip-10-190-221-195.ec2.internal/10.190.221.195
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.20.2
> > STARTUP_MSG:   build =
> > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> > 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> > ************************************************************/
> > 12/01/25 20:55:06 ERROR namenode.NameNode: java.net.BindException:
> Problem
> > binding to /107.20.71.205:8020 : Cannot assign requested address
> >     at org.apache.hadoop.ipc.Server.bind(Server.java:190)
> >     at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:253)
> >     at org.apache.hadoop.ipc.Server.<init>(Server.java:1026)
> >     at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:488)
> >     at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)
> >     at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:191)
> >     at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
> >     at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> >     at
> > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> > Caused by: java.net.BindException: Cannot assign requested address
> >     at sun.nio.ch.Net.bind(Native Method)
> >     at
> > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
> >     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> >     at org.apache.hadoop.ipc.Server.bind(Server.java:188)
> >     ... 8 more
> >
> > 12/01/25 20:55:06 INFO namenode.NameNode: SHUTDOWN_MSG:
> > /************************************************************
> > SHUTDOWN_MSG: Shutting down NameNode at
> > ip-10-190-221-195.ec2.internal/10.190.221.195
> > ************************************************************/
> >
> > It seems that the daemon is trying to bind to the external (public)
> address
> > asociated to the EC2 instance, instead of using the internal (private, in
> > the 10.*.*.* range) one, so I guess that the Hadoop .xml config files has
> > not been build properly by Whirr. I'm using the following configuration
> > file:
> >
> > whirr.cluster-name=fermin-hdp-cluster
> > whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,5
> > hadoop-datanode+hadoop-tasktracker
> > whirr.provider=aws-ec2
> > whirr.identity=${env:AWS_ACCESS_KEY_ID}
> > whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
> > whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
> > whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
> >
> > This is my first approach to Whirr, so probably I'm doing something wrong
> > :). I've googled for this issue, but I have find only one similar case
> in an
> > old thread
> > (
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/831a1e39fd1885f3
> ,
> > June 2011) without no clear solution at the end. So, I'm sending this to
> the
> > Whirr users list in the hope of help... If you need me to do some test or
> > need more information about my case, don't hesitate to ask for that.
> >
> > Any help is really welcome. Thanks in advance!
> >
> > Best regards,
> >
> > ------
> > Fermín
> >
> > ________________________________
> > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> > nuestra política de envío y recepción de correo electrónico en el enlace
> > situado más abajo.
> > This message is intended exclusively for its addressee. We only send and
> > receive email on the basis of the terms set out at
> > http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>
>


-- 

Thanks,
John C

Reply via email to