2012/3/13 Frédéric Cons <[email protected]> > Using the 'host' command from my computer gives the following output : > > host 46.137.24.14 > 14.24.137.46.in-addr.arpa domain name pointer > ec2-46-137-24-14.eu-west-1.compute.amazonaws.com. > host 46.51.133.246 > 246.133.51.46.in-addr.arpa domain name pointer > ec2-46-51-133-246.eu-west-1.compute.amazonaws.com. > host 79.125.35.234 > 234.35.125.79.in-addr.arpa domain name pointer > ec2-79-125-35-234.eu-west-1.compute.amazonaws.com >
Seems to be working as expected. > > Note that the regionservers do correctly try to find the master on its > internal hostname (ip-10-227-58-207.eu-west-1.compute.internal, aka > 10.227.58.207), and that a few telnet command give me: > > telnet ip-10-227-58-207.eu-west-1.compute.internal 60000 > Trying 10.227.58.207... > telnet: Unable to connect to remote host: Connection refused > > telnet ip-10-227-58-207.eu-west-1.compute.internal 60010 > Trying 10.227.58.207... > Connected to ip-10-227-58-207.eu-west-1.compute.internal. > Escape character is '^]'. > > Meanwhile on the master : > fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60000 > tcp 0 0 127.0.1.1:60000 0.0.0.0:* > LISTEN 16502/java > fred@ip-10-227-58-207:~$ sudo netstat -npa | grep 60010 > tcp 0 0 0.0.0.0:60010 0.0.0.0:* > LISTEN 16502/java > > (note the 127.0.1.1 vs 0.0.0.0 host address) > > And when I telnet these master ports from the master itself : > > telnet (localhost|127.0.0.1|127.0.1.1) 60010 are ok > > but only > > telnet 127.0.1.1 60000 is ok (meaning than even using 'localhost' does not > work for port 60000) > > So the question is: why is there this discrepancy in local adresses for > two different ports, but for the same process ? > It may be a configuration / socket binding issue. Not sure if this is a HBase bug or we are doing something wrong. I will take your recipe and try to start a cluster from my machine to check if I see the same behaviour. > I guess it's more an hbase question than a whirr one, but if anyone here > has a hint, I'd love to hear it :) > > Regards > Fred > > > 2012/3/13 Andrei Savu <[email protected]> > >> Frédéric can you perform reverse DNS queries for the public IP addresses >> of the VMs >> started in Amazon from the local machine? >> >> >> 2012/3/13 Frédéric Cons <[email protected]> >> >>> Hi Andrei >>> I tried to use another AWS account, another region, still no luck... >>> And the master process is running (looping on 'waiting for region >>> servers to check in' messages) >>> As I managed to make it work 2 weeks ago, I also suspect it is a weird >>> aws account issue. >>> I'll update this thread if I finally find a solution >>> Thank you >>> Fred >>> >>> >>> 2012/3/12 Andrei Savu <[email protected]> >>> >>>> This is how the recipe we are using for integration testing looks like: >>>> >>>> https://github.com/andreisavu/whirr/blob/trunk/services/hbase/src/test/resources/whirr-hbase-0.90-test.properties >>>> >>>> If you specify only the location-id and no image-id Whirr should be >>>> able to find the right image for you. >>>> >>>> >>>> 2012/3/12 Frédéric Cons <[email protected]> >>>> >>>>> Hi whirr users >>>>> >>>>> I'm trying to deploy a small hbase cluster on ec2, and I'm hitting the >>>>> following network issue >>>>> >>>>> Here's my config file : >>>>> >>>>> whirr.cluster-name=my-hbase-cluster >>>>> whirr.instance-templates=1 >>>>> zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,2 >>>>> hadoop-datanode+hadoop-tasktracker+hbase-regionserver >>>>> hbase-site.dfs.replication=1 >>>>> whirr.provider=aws-ec2 >>>>> whirr.identity=<my_id> >>>>> whirr.credential=<my_cred> >>>>> whirr.hardware-id=m1.large >>>>> whirr.image-id=eu-west-1/ami-895069fd >>>>> whirr.location-id=eu-west-1 >>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr >>>>> whirr.public-key-file=${whirr.private-key-file}.pub >>>>> whirr.hbase.tarball.url= >>>>> http://apache.cict.fr/hbase/hbase-0.90.4/hbase-0.90.4.tar.gz >>>>> whirr.hadoop.tarball.url= >>>>> http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz >>>>> >>>>> So the configuration is pretty standard... >>>>> >>>>> The problem is : the regions servers can't talk to the master server, >>>>> because port 60000 does not seem to be opened (the hbase master rpc port >>>>> if >>>>> I get it correctly) >>>>> >>>>> * Througt telnet : >>>>> telnet ip-10-58-170-126.eu-west-1.compute.internal 60000 >>>>> Trying 10.58.170.126... >>>>> telnet: Unable to connect to remote host: Connection refused >>>>> >>>>> * In the region server log : >>>>> >>>>> 2012-03-12 12:53:20,889 INFO >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to >>>>> Master server at ip-10-58-170-126.eu-west-1.compute.internal:60000 >>>>> 2012-03-12 12:54:21,000 WARN >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to >>>>> master. Retrying. Error was: >>>>> java.net.ConnectException: Connection refused >>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) >>>>> at >>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >>>>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) >>>>> at $Proxy5.getProtocolVersion(Unknown Source) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1462) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1515) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1499) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:572) >>>>> at java.lang.Thread.run(Thread.java:662) >>>>> >>>>> >>>>> Note that other hadoop / hbase related ports look fine : i can telnet >>>>> from the region server to the master on port 60010 for example. >>>>> The hadoop logs on the region servers (who also act as datanodes / >>>>> tasktrackers) look fine >>>>> >>>>> The EC2 security group also look fine : ports 1 - 65535 for tcp and >>>>> udp seem to be opened for the whole security group. >>>>> >>>>> I'm using whirr 0.7.1, and tried various ubuntu AMIs / hbase+hadoop >>>>> combinations >>>>> >>>>> Any idea on what's going on here ? >>>>> >>>>> Best regards >>>>> Fred >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >
