Can upgrade now but I would take suggestions on how to deal with this On Feb 13, 2014 2:02 PM, "Stack" <[email protected]> wrote:
> Can you upgrade Jack? This stuff is better in later versions (dfsclient > keeps running list of bad datanodes...) > St.Ack > > > On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <[email protected]> wrote: > > > As far as I can tell I am hitting this issue: > > > > > > > http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%[email protected]%[email protected]@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u > > > > > > 1581 < > > > http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581 > > > > > // search cached blocks first > > > > 1582 < > > > http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582 > > > > > *int* targetBlockIdx = locatedBlocks > > < > > > http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks > > >.findBlock > > < > > > http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29 > > >(offset); > > > > 1583 < > > > http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583 > > > > > *if* (targetBlockIdx < 0) { // block is not cached > > > > > > Our RS DFSClient is asking for a block on a dead datanode because the > > block is somehow cached in DDFClient. It seems that after DN dies, > > DFSClients in 90.5v of HBase do not drop the cache reference where > > those blocks are. Seems like a problem. It would be good if there > > was an ability for that cache to expire because our dead DN was down > > since Sunday. > > > > > > -Jack > > > > > > > > > > On Thu, Feb 13, 2014 at 11:23 AM, Stack <[email protected]> wrote: > > > > > RS opens files and then keeps them open as long as the RS is alive. > > We're > > > failing read of this replica and then we succeed getting the block > > > elsewhere? You get that exception every time? What hadoop version > Jack? > > > You have short-circuit reads on? > > > St.Ack > > > > > > > > > On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <[email protected]> > wrote: > > > > > > > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / > > shows > > > > no issues. > > > > > > > > > > > > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <[email protected]> > > wrote: > > > > > > > > > Good morning -- > > > > > I had a question, we have had a datanode go down, and its been down > > for > > > > > few days, however hbase is trying to talk to that dead datanode > still > > > > > 2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: > > Failed > > > to > > > > > connect to /10.101.5.5:50010 for file > > > > > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544 > > > for > > > > > block 805865 > > > > > > > > > > so, question is, how come RS trying to talk to dead datanode, its > on > > in > > > > > HDFS list even. > > > > > > > > > > Isn't the RS is just HDFS client? And it should not talk to > offlined > > > > HDFS > > > > > datanode that went down? This caused a lot of issues in our > cluster. > > > > > > > > > > Thanks, > > > > > -Jack > > > > > > > > > > > > > > >
