2012/3/13 Frédéric Cons <[email protected]>

> Using the 'host' command from my computer gives the following output :
>
> host 46.137.24.14
> 14.24.137.46.in-addr.arpa domain name pointer
> ec2-46-137-24-14.eu-west-1.compute.amazonaws.com.
> host 46.51.133.246
> 246.133.51.46.in-addr.arpa domain name pointer
> ec2-46-51-133-246.eu-west-1.compute.amazonaws.com.
> host 79.125.35.234
> 234.35.125.79.in-addr.arpa domain name pointer
> ec2-79-125-35-234.eu-west-1.compute.amazonaws.com
>

Seems to be working as expected.


>
> Note that the regionservers do correctly try to find the master on its
> internal hostname (ip-10-227-58-207.eu-west-1.compute.internal, aka
> 10.227.58.207), and that a few telnet command give me:
>
> telnet ip-10-227-58-207.eu-west-1.compute.internal 60000
> Trying 10.227.58.207...
> telnet: Unable to connect to remote host: Connection refused
>
> telnet ip-10-227-58-207.eu-west-1.compute.internal 60010
> Trying 10.227.58.207...
> Connected to ip-10-227-58-207.eu-west-1.compute.internal.
> Escape character is '^]'.
>
> Meanwhile on the master :
> fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60000
> tcp        0      0 127.0.1.1:60000         0.0.0.0:*
> LISTEN      16502/java
> fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60010
> tcp        0      0 0.0.0.0:60010           0.0.0.0:*
> LISTEN      16502/java
>
> (note the 127.0.1.1 vs 0.0.0.0 host address)
>
> And when I telnet these master ports from the master itself :
>
> telnet (localhost|127.0.0.1|127.0.1.1) 60010 are ok
>
> but only
>
> telnet 127.0.1.1 60000 is ok (meaning than even using 'localhost' does not
> work for port 60000)
>
> So the question is: why is there this discrepancy in local adresses for
> two different ports, but for the same process ?
>

It may be a configuration / socket binding issue. Not sure if this is a
HBase bug or we are doing something wrong.

I will take your recipe and try to start a cluster from my machine to check
if I see the same behaviour.


> I guess it's more an hbase question than a whirr one, but if anyone here
> has a hint, I'd love to hear it :)
>
> Regards
> Fred
>
>
> 2012/3/13 Andrei Savu <[email protected]>
>
>> Frédéric can you perform reverse DNS queries for the public IP addresses
>> of the VMs
>> started in Amazon from the local machine?
>>
>>
>> 2012/3/13 Frédéric Cons <[email protected]>
>>
>>> Hi Andrei
>>> I tried to use another AWS account, another region, still no luck...
>>> And the master process is running (looping on 'waiting for region
>>> servers to check in' messages)
>>> As I managed to make it work 2 weeks ago, I also suspect it is a weird
>>> aws account issue.
>>> I'll update this thread if I finally find a solution
>>> Thank you
>>> Fred
>>>
>>>
>>> 2012/3/12 Andrei Savu <[email protected]>
>>>
>>>> This is how the recipe we are using for integration testing looks like:
>>>>
>>>> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties
>>>>
>>>> If you specify only the location-id and no image-id Whirr should be
>>>> able to find the right image for you.
>>>>
>>>>
>>>> 2012/3/12 Frédéric Cons <[email protected]>
>>>>
>>>>> Hi whirr users
>>>>>
>>>>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the
>>>>> following network issue
>>>>>
>>>>> Here's my config file :
>>>>>
>>>>> whirr.cluster-name=my-hbase-cluster
>>>>> whirr.instance-templates=1
>>>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2
>>>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver
>>>>> hbase-site.dfs.replication=1
>>>>> whirr.provider=aws-ec2
>>>>> whirr.identity=<my_id>
>>>>> whirr.credential=<my_cred>
>>>>> whirr.hardware-id=m1.large
>>>>> whirr.image-id=eu-west-1/ami-895069fd
>>>>> whirr.location-id=eu-west-1
>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr
>>>>> whirr.public-key-file=${whirr.private-key-file}.pub
>>>>> whirr.hbase.tarball.url=
>>>>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz
>>>>> whirr.hadoop.tarball.url=
>>>>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
>>>>>
>>>>> So the configuration is pretty standard...
>>>>>
>>>>> The problem is : the regions servers can't talk to the master server,
>>>>> because port 60000 does not seem to be opened (the hbase master rpc port 
>>>>> if
>>>>> I get it correctly)
>>>>>
>>>>> * Througt telnet :
>>>>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000
>>>>> Trying 10.58.170.126...
>>>>> telnet: Unable to connect to remote host: Connection refused
>>>>>
>>>>> * In the region server log :
>>>>>
>>>>> 2012-03-12 12:53:20,889 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to
>>>>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000
>>>>> 2012-03-12 12:54:21,000 WARN
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to
>>>>> master. Retrying. Error was:
>>>>> java.net.ConnectException: Connection refused
>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>>>>         at $Proxy5.getProtocolVersion(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>>>>         at
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499)
>>>>>         at
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572)
>>>>>         at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>> Note that other hadoop / hbase related ports look fine : i can telnet
>>>>> from the region server to the master on port 60010 for example.
>>>>> The hadoop logs on the region servers (who also act as datanodes /
>>>>> tasktrackers) look fine
>>>>>
>>>>> The EC2 security group also look fine : ports 1 - 65535 for tcp and
>>>>> udp seem to be opened for the whole security group.
>>>>>
>>>>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop
>>>>> combinations
>>>>>
>>>>> Any idea on what's going on here ?
>>>>>
>>>>> Best regards
>>>>> Fred
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to