Re: Question about dead datanode

Jack Levin Thu, 13 Feb 2014 13:42:02 -0800

As far as I can tell I am hitting this issue:

http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%[email protected]%[email protected]@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u



1581 
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581>
// search cached blocks first

1582 
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582>
*int* targetBlockIdx = locatedBlocks
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks>.findBlock
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29>(offset);

1583 
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583>
*if* (targetBlockIdx < 0) { // block is not cached


Our RS DFSClient is asking for a block on a dead datanode because the
block is somehow cached in DDFClient.  It seems that after DN dies,
DFSClients in 90.5v of HBase do not drop the cache reference where
those blocks are.  Seems like a problem.  It would be good if there
was an ability for that cache to expire because our dead DN was down
since Sunday.


-Jack




On Thu, Feb 13, 2014 at 11:23 AM, Stack <[email protected]> wrote:

> RS opens files and then keeps them open as long as the RS is alive.  We're
> failing read of this replica and then we succeed getting the block
> elsewhere?  You get that exception every time?  What hadoop version Jack?
>  You have short-circuit reads on?
> St.Ack
>
>
> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <[email protected]> wrote:
>
> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / shows
> > no issues.
> >
> >
> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <[email protected]> wrote:
> >
> > >  Good morning --
> > > I had a question, we have had a datanode go down, and its been down for
> > > few days, however hbase is trying to talk to that dead datanode still
> > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> to
> > > connect to /10.101.5.5:50010 for file
> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544
> for
> > > block 805865
> > >
> > > so, question is, how come RS trying to talk to dead datanode, its on in
> > > HDFS list even.
> > >
> > > Isn't the RS is just HDFS client?  And it should not talk to offlined
> > HDFS
> > > datanode that went down?  This caused a lot of issues in our cluster.
> > >
> > > Thanks,
> > > -Jack
> > >
> >
>

Re: Question about dead datanode

Reply via email to