Hi sirs,

I've been playing around with the DNS configuration on one of my region servers 
(trying to get all communication to go over another private interface).  
Unfortunately I was doing this while the master was still using the server.  
Now it seems like I cannot telnet from this slave client to the master on port 
60000.  

Turning off the master and running nc -l -k 60000 on the master, I can 
successfully telnet to that port from the slave box.  Which means 
DNS/routing/firewalls are fine.  Restarting the master registers all the other 
RS's except the one in question.

Why would the master not want to talk to a region server.  Is it possible that 
since the region server was flopping (during my testing of DNS) that it was 
added some internal blacklist?  There were no error messages on 
hbase-master.log during the connection attempt.

Thanks,
p.s. I went the DNS route because the interfaces are not uniform across the 
boxes (eth0, bond0, etc).  hbase-0.90.1 (cdhu0)

Error message on client:
2011-06-06 14:45:53,984 [regionserver60020] INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to 
Master server at <master_hostname_scrubbed>:60000
2011-06-06 14:44:47,942 [regionserver60020] WARN 
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to 
master. Retrying. Error was:
java.net.ConnectException: Connection refused
...

Also telnet <master_hostname_scrubbed> 60000 returns immediately as connection 
refused on this box.  The same command works fine on all the other region 
server boxes.

Reply via email to