Hi sirs, I've been playing around with the DNS configuration on one of my region servers (trying to get all communication to go over another private interface). Unfortunately I was doing this while the master was still using the server. Now it seems like I cannot telnet from this slave client to the master on port 60000.
Turning off the master and running nc -l -k 60000 on the master, I can successfully telnet to that port from the slave box. Which means DNS/routing/firewalls are fine. Restarting the master registers all the other RS's except the one in question. Why would the master not want to talk to a region server. Is it possible that since the region server was flopping (during my testing of DNS) that it was added some internal blacklist? There were no error messages on hbase-master.log during the connection attempt. Thanks, p.s. I went the DNS route because the interfaces are not uniform across the boxes (eth0, bond0, etc). hbase-0.90.1 (cdhu0) Error message on client: 2011-06-06 14:45:53,984 [regionserver60020] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at <master_hostname_scrubbed>:60000 2011-06-06 14:44:47,942 [regionserver60020] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was: java.net.ConnectException: Connection refused ... Also telnet <master_hostname_scrubbed> 60000 returns immediately as connection refused on this box. The same command works fine on all the other region server boxes.
