Re: Question about dead datanode

Jack Levin Thu, 13 Feb 2014 13:54:08 -0800

This might be related:

http://hadoop.6.n7.nabble.com/Question-on-opening-file-info-from-namenode-in-DFSClient-td6679.html


> In hbase, we open the file once and keep it open.  File is shared
> amongst all clients.
>

Does it mean its perma cached if datanode is dead?

-Jack


On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <magn...@gmail.com> wrote:

> As far as I can tell I am hitting this issue:
>
>
> http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%24releases@com.cloudera.hadoop%24hadoop-core@0.20.2-320@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u
>
>
>
> 1581 
> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581>
> // search cached blocks first
>
> 1582 
> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582>
> *int* targetBlockIdx = locatedBlocks 
> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks>.findBlock
>  
> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29>(offset);
>
>  1583 
> <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583>
> *if* (targetBlockIdx < 0) { // block is not cached
>
>
> Our RS DFSClient is asking for a block on a dead datanode because the block 
> is somehow cached in DDFClient.  It seems that after DN dies, DFSClients in 
> 90.5v of HBase do not drop the cache reference where those blocks are.  Seems 
> like a problem.  It would be good if there was an ability for that cache to 
> expire because our dead DN was down since Sunday.
>
>
> -Jack
>
>
>
>
> On Thu, Feb 13, 2014 at 11:23 AM, Stack <st...@duboce.net> wrote:
>
>> RS opens files and then keeps them open as long as the RS is alive.  We're
>> failing read of this replica and then we succeed getting the block
>> elsewhere?  You get that exception every time?  What hadoop version Jack?
>>  You have short-circuit reads on?
>> St.Ack
>>
>>
>> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <magn...@gmail.com> wrote:
>>
>> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck /
>> shows
>> > no issues.
>> >
>> >
>> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <magn...@gmail.com> wrote:
>> >
>> > >  Good morning --
>> > > I had a question, we have had a datanode go down, and its been down
>> for
>> > > few days, however hbase is trying to talk to that dead datanode still
>> > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient:
>> Failed to
>> > > connect to /10.101.5.5:50010 for file
>> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544
>> for
>> > > block 805865
>> > >
>> > > so, question is, how come RS trying to talk to dead datanode, its on
>> in
>> > > HDFS list even.
>> > >
>> > > Isn't the RS is just HDFS client?  And it should not talk to offlined
>> > HDFS
>> > > datanode that went down?  This caused a lot of issues in our cluster.
>> > >
>> > > Thanks,
>> > > -Jack
>> > >
>> >
>>
>
>

Re: Question about dead datanode

Reply via email to