Missing region data.

James Estes Thu, 22 Dec 2011 13:35:11 -0800

We have a 6 node 0.90.3-cdh3u1 cluster.  We have 8092 regions.  I
realize we have too many regions and too few nodes…we're addressing
that.  We currently have an issue where we seem to have lost region
data.  When data is requested for a couple of our regions, we get
errors like the following on the client:


org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 1 action: IOException: 1 time, servers with issues:
node13host:60020
…
java.io.IOException: java.io.IOException: Could not seek
StoreFileScanner[HFileScanner for reader
reader=hdfs://namenodehost:54310/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568,
compression=none, inMemory=false,
firstKey=95ac7c7894f86d4455885294582370e30a68fdf1/data:acquireDate/1321151006961/Put,
lastKey=95b47d337ff72da0670d0f3803443dd3634681ec/data:text/1323129675986/Put,
avgKeyLen=65, avgValueLen=24, entries=6753283, length=667536405,
cur=null]
…
Caused by: java.io.FileNotFoundException: File does not exist:
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568

On node13host, we see similar exceptions:

2011-12-22 02:25:27,509 WARN org.apache.hadoop.hdfs.DFSClient: Failed
to connect to /node13host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270:java.io.IOException: Got error in
response to OP_READ_BLOCK self=/node13host:37847, remote=
/node13host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270_15820239

2011-12-22 02:25:27,511 WARN org.apache.hadoop.hdfs.DFSClient: Failed
to connect to /node08host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270:java.io.IOException: Got error in
response to OP_READ_BLOCK self=/node13host:44290, remote=
/node08host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270_15820239

2011-12-22 02:25:27,512 WARN org.apache.hadoop.hdfs.DFSClient: Failed
to connect to /node10host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270:java.io.IOException: Got error in
response to OP_READ_BLOCK self=/node13host:52113, remote=
/node10host:50010 for file
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568
for block -7065741853936038270_15820239

2011-12-22 02:25:27,513 INFO org.apache.hadoop.hdfs.DFSClient: Could
not obtain block blk_-7065741853936038270_15820239 from any node:
java.io.IOException: No live nodes contain current block. Will get new
block locations from namenode and retry...
2011-12-22 02:25:30,515 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
java.io.IOException: Could not seek StoreFileScanner[HFileScanner for
reader 
reader=hdfs://namenodehost:54310/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568,
compression=none, inMemory=false,
firstKey=95ac7c7894f86d4455885294582370e30a68fdf1/data:acquireDate/1321151006961/Put,
lastKey=95b47d337ff72da0670d0f3803443dd3634681ec/data:text/1323129675986/Put,
avgKeyLen=65, avgValueLen=24, entries=6753283, length=667536405,
cur=null]
…
Caused by: java.io.FileNotFoundException: File does not exist:
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568


The file referenced is indeed not in hdfs.  Grepping further back in
the logs reveals that the problem has been occuring for over a week
(likely longer, but the logs have rolled off).  There are a bunch of
files in /hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/ (270 of
them),  unsure why they aren't compacting, I looked further in the
logs and find similar exceptions when trying to do a major compaction,
ultimately failing b/c of:
Caused by: java.io.FileNotFoundException: File does not exist:
/hbase/article/4cbc7c9264820a7b30ddd5755d77ab07/data/6810866521278698568

Any help on how to recover?  hbck did identify some inconsistencies,
we went forward with a -fix, but the issue remains.

Missing region data.

Reply via email to