Hi Stack, Thanks, I have all the regions picked up now.
This particular cluster is on CDH3b4 (long story) but it is slated to be upgraded next week. Here is a clip from the master log on the timeout: 2012-04-14 09:44:24,110 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-4506327502711501968_8582436 failed because recovery from primary datanode 192.168.1.12:50010 failed 1 times. Pipeline was 192.168.1.24:50010, 192.168.1.12:50010, 192.168.1.22:50010. Will retry... 2012-04-14 09:45:25,118 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode 192.168.1.12:50010 java.net.SocketTimeoutException: Call to /192.168.1.12:50020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.31:35014 remote=/192.168.1.12:50020] The corresponding data node, 192.168.1.12, had this: 2012-04-14 09:44:24,171 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to getBlockMetaDataInfo for block (=blk_-4506327502711501968_8582436) from datanode (=192.168.1.22:50010) java.net.SocketTimeoutException: Call to /192.168.1.22:50020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.12:36626 remote=/192.168.1.22:50020] 192.168.1.22 was the node that died. I can put more logs in paste bin if needed. thanks, -chris On Apr 14, 2012, at 3:23 PM, Stack wrote: > On Sat, Apr 14, 2012 at 9:49 AM, Chris Tarnas <[email protected]> wrote: >> I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split and >> didn't see any notes about not running on a live cluster so I ran it and it >> ran fine. Was it safe to run with hbase up? Were the newly created files >> correctly added to the existing regions? >> > > Should be fine w/ hbase up -- thats how the split is usually done. > > The newly created files were not added to the regions is my guess > since we only check for their presence on region open. > > Can you see what new files were made and where? Reassign those > regions and that should pick up the edits made by your split. > > What version of hbase Chris? > > St.Ack
