Copying CDH Users mailing list. On Thu, Apr 18, 2013 at 6:37 PM, Varun Sharma <[email protected]> wrote:
> I am wondering if DFSClient caches the data node for a long period of time > ? > > Varun > > > On Thu, Apr 18, 2013 at 6:01 PM, Varun Sharma <[email protected]> wrote: > > > Hi, > > > > We are facing problems with really slow HBase region server recoveries ~ > > 20 minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0. > > > > Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale > > node timeouts configured correctly. Time for dead node detection is still > > 10 minutes. > > > > We see that our region server is trying to read an HLog is stuck there > for > > a long time. Logs here: > > > > 2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to > > connect to /10.156.194.251:50010 for file > > > /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599 > > for block > > > BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException: > > 15000 millis timeout while waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818 > remote=/ > > 10.156.194.251:50010] > > > > I would think that HDFS 3703 would make the server fail fast and go to > the > > third datanode. Currently, the recovery seems way too slow for production > > usage... > > > > Varun > > >
