Hi Jean, We have upgraded to branch-0.20-append working with hbase 0.20.6. But it looks we are still meeting the same problem. And today I found we started to get tons of these issue when a Hadoop balance started. I am wondering will hadoop balancing for the data files will impact the meta info for the hbase?
We didn't even find a line of "slept for Xms" logs in the region server. We were really struggling with these issues these days. Will really appreciate any help. Thanks. On Tue, May 10, 2011 at 6:23 AM, Jean-Daniel Cryans <[email protected]>wrote: > Very often the "cannot open filename" happens when the region in > question was reopened somewhere else and that region was compacted. As > to why it was reassigned, most of the time it's because of garbage > collections taking too long. The master log should have all the > required evidence, and the region server should print some "slept for > Xms" (where X is some number of ms) messages before everything goes > bad. > > Here are some general tips on debugging problems in HBase > http://hbase.apache.org/book/trouble.html > > J-D > > On Sat, May 7, 2011 at 2:10 AM, Stanley Xu <[email protected]> wrote: > > Dear all, > > > > We were using HBase 0.20.6 in our environment, and it is pretty stable in > > the last couple of month, but we met some reliability issue from last > week. > > Our situation is very like the following link. > > > http://search-hadoop.com/m/UJW6Efw4UW/Got+error+in+response+to+OP_READ_BLOCK+for+file&subj=HBase+fail+over+reliability+issues > > > > When we use a hbase client to connect to the hbase table, it looks stuck > > there. And we can find the logs like > > > > WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to / > > 10.24.166.74:50010 for *file* > /hbase/users/73382377/data/312780071564432169 > > for block -4841840178880951849:java.io.IOException: *Got* *error* in * > > response* to > > OP_READ_BLOCK for *file* /hbase/users/73382377/data/312780071564432169 > for > > block -4841840178880951849 > > > > INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on 60020, > call > > get([B@25f907b4, row=963aba6c5f351f5655abdc9db82a4cbd, maxVersions=1, > > timeRange=[0,9223372036854775807), families={(family=data, columns=ALL}) > > from 10.24.117.100:2365: *error*: java.io.IOException: Cannot open > filename > > /hbase/users/73382377/data/312780071564432169 > > java.io.IOException: Cannot open filename > > /hbase/users/73382377/data/312780071564432169 > > > > > > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration( > > 10.24.166.74:50010, > storageID=DS-14401423-10.24.166.74-50010-1270741415211, > > infoPort=50075, ipcPort=50020): > > *Got* exception while serving blk_-4841840178880951849_50277 to / > > 10.25.119.113 > > : > > java.io.IOException: Block blk_-4841840178880951849_50277 is not valid. > > > > in the server side. > > > > And if we do a flush and then a major compaction on the ".META.", the > > problem just went away, but will appear again some time later. > > > > At first we guess it might be the problem of xceiver. So we set the > xceiver > > to 4096 as the link here. > > http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html > > > > But we still get the same problem. It looks that a restart of the whole > > HBase cluster will fix the problem for a while, but actually we could not > > say always trying to restart the server. > > > > I am waiting online, will really appreciate any message. > > > > > > Best wishes, > > Stanley Xu > > >
