Hi James, Since it seems intermittent, have you verified if there are any maintenance type procedures being done within the namenode machine or it's related network?
On Sun, Oct 28, 2012 at 9:51 PM, Jianhui Zhang <[email protected]>wrote: > Hi folks, > > We've got this weird problem regularly on our NameNode (apache > hadoop-0.20.205.0) - every couple of weeks: > > The JobTracker had this error: > > 2012-10-08 11:44:03,928 WARN org.apache.hadoop.hdfs.DFSClient: Problem > renewing lease for DFSClient_1416124356 > java.io.IOException: Call to nn-virtual.x.y.z/1.2.3.4:8020 failed on > local exception: java.net.BindException: Cannot assign requested address > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103) > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy5.renewLease(Unknown Source) > > and > > 2012-10-08 11:44:03,927 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: nn-virtual.x.y.z/1.2.3.4:8020. Already tried 9 time(s). > > in which "nn-virtual.x.y.z/1.2.3.4:8020" is our HDFS address. > > I listed out all the local addresses on the NN and got about 24K (more or > less) open ports. The ip_local_port_range has: > > 32768 61000 > > We are not reaching the limit, but very close. What's strange is: almost > all of the local ports are used by the NN process. There might be some > holes in the list, but overall, it seems the NN was using up all the > ephemeral ports available in the range. > > Right now, I strongly suspect that "Cannot assign requested address" is > due to lack of ports - although I'm not 100% sure since the ephemeral ports > change all the time. > > Has anybody seen this before? Any pointers would be appreciated. > > Also, we are using a virtual IP for the NN. All the ports are opened on > the virtual IP address. Could it be related to the problem? > > Thanks for your help, > James >
