This might be related: http://hadoop.6.n7.nabble.com/Question-on-opening-file-info-from-namenode-in-DFSClient-td6679.html
> In hbase, we open the file once and keep it open. File is shared > amongst all clients. > Does it mean its perma cached if datanode is dead? -Jack On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <magn...@gmail.com> wrote: > As far as I can tell I am hitting this issue: > > > http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%24releases@com.cloudera.hadoop%24hadoop-core@0.20.2-320@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u > > > > 1581 > <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581> > // search cached blocks first > > 1582 > <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582> > *int* targetBlockIdx = locatedBlocks > <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks>.findBlock > > <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29>(offset); > > 1583 > <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583> > *if* (targetBlockIdx < 0) { // block is not cached > > > Our RS DFSClient is asking for a block on a dead datanode because the block > is somehow cached in DDFClient. It seems that after DN dies, DFSClients in > 90.5v of HBase do not drop the cache reference where those blocks are. Seems > like a problem. It would be good if there was an ability for that cache to > expire because our dead DN was down since Sunday. > > > -Jack > > > > > On Thu, Feb 13, 2014 at 11:23 AM, Stack <st...@duboce.net> wrote: > >> RS opens files and then keeps them open as long as the RS is alive. We're >> failing read of this replica and then we succeed getting the block >> elsewhere? You get that exception every time? What hadoop version Jack? >> You have short-circuit reads on? >> St.Ack >> >> >> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <magn...@gmail.com> wrote: >> >> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / >> shows >> > no issues. >> > >> > >> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <magn...@gmail.com> wrote: >> > >> > > Good morning -- >> > > I had a question, we have had a datanode go down, and its been down >> for >> > > few days, however hbase is trying to talk to that dead datanode still >> > > 2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: >> Failed to >> > > connect to /10.101.5.5:50010 for file >> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544 >> for >> > > block 805865 >> > > >> > > so, question is, how come RS trying to talk to dead datanode, its on >> in >> > > HDFS list even. >> > > >> > > Isn't the RS is just HDFS client? And it should not talk to offlined >> > HDFS >> > > datanode that went down? This caused a lot of issues in our cluster. >> > > >> > > Thanks, >> > > -Jack >> > > >> > >> > >