I am wondering if DFSClient caches the data node for a long period of time ?
Varun On Thu, Apr 18, 2013 at 6:01 PM, Varun Sharma <[email protected]> wrote: > Hi, > > We are facing problems with really slow HBase region server recoveries ~ > 20 minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0. > > Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale > node timeouts configured correctly. Time for dead node detection is still > 10 minutes. > > We see that our region server is trying to read an HLog is stuck there for > a long time. Logs here: > > 2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to > connect to /10.156.194.251:50010 for file > /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599 > for block > BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException: > 15000 millis timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818remote=/ > 10.156.194.251:50010] > > I would think that HDFS 3703 would make the server fail fast and go to the > third datanode. Currently, the recovery seems way too slow for production > usage... > > Varun >
