Hi Stack,

Thanks, I have all the regions picked up now.

This particular cluster is on CDH3b4 (long story) but it is slated to be 
upgraded next week.

Here is a clip from the master log on the timeout:

2012-04-14 09:44:24,110 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block blk_-4506327502711501968_8582436 failed  because recovery from 
primary datanode 192.168.1.12:50010 failed 1 times.  Pipeline was 
192.168.1.24:50010, 192.168.1.12:50010, 192.168.1.22:50010. Will retry...
2012-04-14 09:45:25,118 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery 
attempt #1 from primary datanode 192.168.1.12:50010
java.net.SocketTimeoutException: Call to /192.168.1.12:50020 failed on socket 
timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/192.168.1.31:35014 
remote=/192.168.1.12:50020]


The corresponding data node, 192.168.1.12,  had this:

2012-04-14 09:44:24,171 WARN 
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
getBlockMetaDataInfo for block (=blk_-4506327502711501968_8582436) from 
datanode (=192.168.1.22:50010)
java.net.SocketTimeoutException: Call to /192.168.1.22:50020 failed on socket 
timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/192.168.1.12:36626 
remote=/192.168.1.22:50020]

192.168.1.22 was the node that died. I can put more logs in paste bin if needed.

thanks,
-chris


On Apr 14, 2012, at 3:23 PM, Stack wrote:

> On Sat, Apr 14, 2012 at 9:49 AM, Chris Tarnas <[email protected]> wrote:
>> I looked into org.apache.hadoop.hbase.regionserver.wal.HLog --split  and 
>> didn't see any notes about not running on a live cluster so I ran it and it 
>> ran fine.  Was it safe to run with hbase up? Were the newly created files 
>> correctly added to the existing regions?
>> 
> 
> Should be fine w/ hbase up -- thats how the split is usually done.
> 
> The newly created files were not added to the regions is my guess
> since we only check for their presence on region open.
> 
> Can you see what new files were made and where?  Reassign those
> regions and that should pick up the edits made by your split.
> 
> What version of hbase Chris?
> 
> St.Ack

Reply via email to