Re: Question about dead datanode

Jack Levin Thu, 13 Feb 2014 15:22:16 -0800

Can upgrade now but I would take suggestions on how to deal with this
On Feb 13, 2014 2:02 PM, "Stack" <[email protected]> wrote:


> Can you upgrade Jack?  This stuff is better in later versions (dfsclient
> keeps running list of bad datanodes...)
> St.Ack
>
>
> On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <[email protected]> wrote:
>
> > As far as I can tell I am hitting this issue:
> >
> >
> >
> http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%[email protected]%[email protected]@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u
> >
> >
> > 1581 <
> >
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581
> > >
> > // search cached blocks first
> >
> > 1582 <
> >
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582
> > >
> > *int* targetBlockIdx = locatedBlocks
> > <
> >
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks
> > >.findBlock
> > <
> >
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29
> > >(offset);
> >
> > 1583 <
> >
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583
> > >
> > *if* (targetBlockIdx < 0) { // block is not cached
> >
> >
> > Our RS DFSClient is asking for a block on a dead datanode because the
> > block is somehow cached in DDFClient.  It seems that after DN dies,
> > DFSClients in 90.5v of HBase do not drop the cache reference where
> > those blocks are.  Seems like a problem.  It would be good if there
> > was an ability for that cache to expire because our dead DN was down
> > since Sunday.
> >
> >
> > -Jack
> >
> >
> >
> >
> > On Thu, Feb 13, 2014 at 11:23 AM, Stack <[email protected]> wrote:
> >
> > > RS opens files and then keeps them open as long as the RS is alive.
> >  We're
> > > failing read of this replica and then we succeed getting the block
> > > elsewhere?  You get that exception every time?  What hadoop version
> Jack?
> > >  You have short-circuit reads on?
> > > St.Ack
> > >
> > >
> > > On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <[email protected]>
> wrote:
> > >
> > > > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck /
> > shows
> > > > no issues.
> > > >
> > > >
> > > > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <[email protected]>
> > wrote:
> > > >
> > > > >  Good morning --
> > > > > I had a question, we have had a datanode go down, and its been down
> > for
> > > > > few days, however hbase is trying to talk to that dead datanode
> still
> > > > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient:
> > Failed
> > > to
> > > > > connect to /10.101.5.5:50010 for file
> > > > >
> /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544
> > > for
> > > > > block 805865
> > > > >
> > > > > so, question is, how come RS trying to talk to dead datanode, its
> on
> > in
> > > > > HDFS list even.
> > > > >
> > > > > Isn't the RS is just HDFS client?  And it should not talk to
> offlined
> > > > HDFS
> > > > > datanode that went down?  This caused a lot of issues in our
> cluster.
> > > > >
> > > > > Thanks,
> > > > > -Jack
> > > > >
> > > >
> > >
> >
>

Re: Question about dead datanode

Reply via email to