Hey all,

 OK thanks for your advice on setting up a hadoop test environment to get
started in learning how to use hadoop! I'm very excited to be able to start
to take this plunge!

Although rather than using BigTop or Cloudera, I just decided to go for a
straight apache hadoop install. I setup 3 t2micro instances on EC2 for my
training purposes. And that seemed to go alright! As far as installing
hadoop and starting the services goes.

I went so far as to setup the ssh access that the nodes will need. And the
services seem to start without issue:

bash-4.2$ whoami
hadoop

bash-4.2$ start-dfs.sh

Starting namenodes on [hadoop1.mydomain.com]

hadoop1.mydomain.com: starting namenode, logging to
/home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out

hadoop2.mydomain.com: starting datanode, logging to
/home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out

hadoop3.mydomain.com: starting datanode, logging to
/home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to
/home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out

bash-4.2$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to
/home/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.out

hadoop2.mydomain.com: starting nodemanager, logging to
/home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.out

hadoop3.mydomain.com: starting nodemanager, logging to
/home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out

And I opened up these ports on the security groups for the two data nodes:

[root@hadoop2:~] #netstat -tulpn | grep -i listen | grep java

tcp        0      0 0.0.0.0:*50010*           0.0.0.0:*
LISTEN      21405/java

tcp        0      0 0.0.0.0:*50075*           0.0.0.0:*
LISTEN      21405/java

tcp        0      0 0.0.0.0:*50020*           0.0.0.0:*
LISTEN      21405/java
But when I go to the hadoop web interface at:

http://hadoop1.mydomain.com:50070 <http://hadoop1.jokefire.com:50070/>

And click on the data node tab, I see no nodes are connected!

I see that the hosts are listening on all interfaces.

I also put all hosts into the /etc/hosts file on the master node.

Using the first data node as an example I can telnet into each port on both
datanodes from the master node:

bash-4.2$ telnet hadoop2.mydomain.com *50010*

Trying 172.31.63.42...

Connected to hadoop2.mydomain.com.

Escape character is '^]'.

^]

telnet> quit

Connection closed.

bash-4.2$ telnet hadoop2.mydomain.com *50075*

Trying 172.31.63.42...

Connected to hadoop2.mydomain.com.

Escape character is '^]'.

^]

telnet> quit

Connection closed.

bash-4.2$ telnet hadoop2.mydomain.com *50020*

Trying 172.31.63.42...

Connected to hadoop2.mydomain.com.

Escape character is '^]'.

^]

telnet> quit

Connection closed.

So apparently I've hit my first snag in setting up a hadoop cluster. Can
anyone give me some tips as to how I can get the data nodes to show as
connected to the master?


Thanks

Tim




-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Reply via email to