I found the code path that does not work, patched it. Will report if it fixes the problem On Feb 14, 2014 8:19 AM, "Jack Levin" <magn...@gmail.com> wrote:
> 0.20.2-cdh3u2 -- > > "add to deadNodes and continue" would solve this issue. For some reason > its not getting into this code path. > > If its a matter of adding a quick line of code to make this work, then we > would rather recompile with that and upgrade later when we have better > backup. > > -Jack > > > On Thu, Feb 13, 2014 at 10:55 PM, Stack <st...@duboce.net> wrote: > >> On Thu, Feb 13, 2014 at 9:18 PM, Jack Levin <magn...@gmail.com> wrote: >> >> > One other question, we get this: >> > >> > 2014-02-13 02:46:12,768 WARN org.apache.hadoop.hdfs.DFSClient: Failed to >> > connect to /10.101.5.5:50010 for file >> > /hbase/img32/b97657bfcbf922045d96315a4ada0782/att/4890606694307129591 >> for >> > block -9099107892773428976:java.net.SocketTimeoutException: 60000 millis >> > timeout while waiting for channel to be ready for connect. ch : >> > java.nio.channels.SocketChannel[connection-pending remote=/ >> > 10.101.5.5:50010] >> > >> > >> > Why can't RS do this instead: >> > >> > >> hbase-root-regionserver-mtab5.prod.imageshack.com.log.2014-02-10:2014-02-10 >> > 22:05:11,763 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect >> to / >> > 10.103.8.109:50010, add to deadNodes and continue >> > >> > "add to deadNodes and continue" specifically? >> > >> >> >> The regionserver runs on the HDFS API. The implementations can vary. The >> management of nodes -- their coming and going -- is done inside the HDFS >> client code. The regionserver is insulated from all that goes on therein. >> >> What version of HDFS are you on Jack? >> >> St.Ack >> > >