Re: Error of "Got error in response to OP_READ_BLOCK for file"

Stanley Xu Wed, 11 May 2011 11:36:43 -0700

And another question, shall I use hbase 0.20.6 if I used the append branch
of hadoop?


在 2011-5-11 上午12:51，"Jean-Daniel Cryans" <[email protected]>写道：
> Data cannot be corrupted at all, since the files in HDFS are immutable
> and CRC'ed (unless you are able to lose all 3 copies of every block).
>
> Corruption would happen at the metadata level, whereas the .META.
> table which contains the regions for the tables would lose rows. This
> is a likely scenario if the region server holding that region dies of
> GC since the hadoop version you are using along hbase 0.20.6 doesn't
> support appends, meaning that the write-ahead log would be missing
> data that, obviously, cannot be replayed.
>
> The best advice I can give you is to upgrade.
>
> J-D
>
> On Tue, May 10, 2011 at 5:44 AM, Stanley Xu <[email protected]> wrote:
>> Thanks J-D. A little more confused that is it looks when we have a
corrupt
>> hbase table or some inconsistency data, we will got lots of message like
>> that. But if the hbase table is proper, we will also get some lines of
>> messages like that.
>>
>> How could I identify if it comes from a corruption in data or just some
>> mis-hit in the scenario you mentioned?
>>
>>
>>
>> On Tue, May 10, 2011 at 6:23 AM, Jean-Daniel Cryans <[email protected]
>wrote:
>>
>>> Very often the "cannot open filename" happens when the region in
>>> question was reopened somewhere else and that region was compacted. As
>>> to why it was reassigned, most of the time it's because of garbage
>>> collections taking too long. The master log should have all the
>>> required evidence, and the region server should print some "slept for
>>> Xms" (where X is some number of ms) messages before everything goes
>>> bad.
>>>
>>> Here are some general tips on debugging problems in HBase
>>> http://hbase.apache.org/book/trouble.html
>>>
>>> J-D
>>>
>>> On Sat, May 7, 2011 at 2:10 AM, Stanley Xu <[email protected]> wrote:
>>> > Dear all,
>>> >
>>> > We were using HBase 0.20.6 in our environment, and it is pretty stable
in
>>> > the last couple of month, but we met some reliability issue from last
>>> week.
>>> > Our situation is very like the following link.
>>> >
>>>
http://search-hadoop.com/m/UJW6Efw4UW/Got+error+in+response+to+OP_READ_BLOCK+for+file&subj=HBase+fail+over+reliability+issues
>>> >
>>> > When we use a hbase client to connect to the hbase table, it looks
stuck
>>> > there. And we can find the logs like
>>> >
>>> > WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
>>> > 10.24.166.74:50010 for *file*
>>> /hbase/users/73382377/data/312780071564432169
>>> > for block -4841840178880951849:java.io.IOException: *Got* *error* in *
>>> > response* to
>>> > OP_READ_BLOCK for *file* /hbase/users/73382377/data/312780071564432169
>>> for
>>> > block -4841840178880951849
>>> >
>>> > INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on
60020,
>>> call
>>> > get([B@25f907b4, row=963aba6c5f351f5655abdc9db82a4cbd, maxVersions=1,
>>> > timeRange=[0,9223372036854775807), families={(family=data,
columns=ALL})
>>> > from 10.24.117.100:2365: *error*: java.io.IOException: Cannot open
>>> filename
>>> > /hbase/users/73382377/data/312780071564432169
>>> > java.io.IOException: Cannot open filename
>>> > /hbase/users/73382377/data/312780071564432169
>>> >
>>> >
>>> > WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
>>> DatanodeRegistration(
>>> > 10.24.166.74:50010,
>>> storageID=DS-14401423-10.24.166.74-50010-1270741415211,
>>> > infoPort=50075, ipcPort=50020):
>>> > *Got* exception while serving blk_-4841840178880951849_50277 to /
>>> > 10.25.119.113
>>> > :
>>> > java.io.IOException: Block blk_-4841840178880951849_50277 is not
valid.
>>> >
>>> > in the server side.
>>> >
>>> > And if we do a flush and then a major compaction on the ".META.", the
>>> > problem just went away, but will appear again some time later.
>>> >
>>> > At first we guess it might be the problem of xceiver. So we set the
>>> xceiver
>>> > to 4096 as the link here.
>>> >
http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html
>>> >
>>> > But we still get the same problem. It looks that a restart of the
whole
>>> > HBase cluster will fix the problem for a while, but actually we could
not
>>> > say always trying to restart the server.
>>> >
>>> > I am waiting online, will really appreciate any message.
>>> >
>>> >
>>> > Best wishes,
>>> > Stanley Xu
>>> >
>>>
>>

Re: Error of "Got error in response to OP_READ_BLOCK for file"

Reply via email to