Re: Question about dead datanode

Jack Levin Thu, 13 Feb 2014 20:55:49 -0800

I meant to say, I can't upgrade now, its a petabyte storage system. A
little hard to keep a copy of something like that.



On Thu, Feb 13, 2014 at 3:20 PM, Jack Levin <[email protected]> wrote:

> Can upgrade now but I would take suggestions on how to deal with this
> On Feb 13, 2014 2:02 PM, "Stack" <[email protected]> wrote:
>
>> Can you upgrade Jack?  This stuff is better in later versions (dfsclient
>> keeps running list of bad datanodes...)
>> St.Ack
>>
>>
>> On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <[email protected]> wrote:
>>
>> > As far as I can tell I am hitting this issue:
>> >
>> >
>> >
>> http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%[email protected]%[email protected]@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u
>> >
>> >
>> > 1581 <
>> >
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581
>> > >
>> > // search cached blocks first
>> >
>> > 1582 <
>> >
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582
>> > >
>> > *int* targetBlockIdx = locatedBlocks
>> > <
>> >
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks
>> > >.findBlock
>> > <
>> >
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29
>> > >(offset);
>> >
>> > 1583 <
>> >
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583
>> > >
>> > *if* (targetBlockIdx < 0) { // block is not cached
>> >
>> >
>> > Our RS DFSClient is asking for a block on a dead datanode because the
>> > block is somehow cached in DDFClient.  It seems that after DN dies,
>> > DFSClients in 90.5v of HBase do not drop the cache reference where
>> > those blocks are.  Seems like a problem.  It would be good if there
>> > was an ability for that cache to expire because our dead DN was down
>> > since Sunday.
>> >
>> >
>> > -Jack
>> >
>> >
>> >
>> >
>> > On Thu, Feb 13, 2014 at 11:23 AM, Stack <[email protected]> wrote:
>> >
>> > > RS opens files and then keeps them open as long as the RS is alive.
>> >  We're
>> > > failing read of this replica and then we succeed getting the block
>> > > elsewhere?  You get that exception every time?  What hadoop version
>> Jack?
>> > >  You have short-circuit reads on?
>> > > St.Ack
>> > >
>> > >
>> > > On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <[email protected]>
>> wrote:
>> > >
>> > > > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck /
>> > shows
>> > > > no issues.
>> > > >
>> > > >
>> > > > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <[email protected]>
>> > wrote:
>> > > >
>> > > > >  Good morning --
>> > > > > I had a question, we have had a datanode go down, and its been
>> down
>> > for
>> > > > > few days, however hbase is trying to talk to that dead datanode
>> still
>> > > > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient:
>> > Failed
>> > > to
>> > > > > connect to /10.101.5.5:50010 for file
>> > > > >
>> /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544
>> > > for
>> > > > > block 805865
>> > > > >
>> > > > > so, question is, how come RS trying to talk to dead datanode, its
>> on
>> > in
>> > > > > HDFS list even.
>> > > > >
>> > > > > Isn't the RS is just HDFS client?  And it should not talk to
>> offlined
>> > > > HDFS
>> > > > > datanode that went down?  This caused a lot of issues in our
>> cluster.
>> > > > >
>> > > > > Thanks,
>> > > > > -Jack
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Question about dead datanode

Reply via email to