I meant to say, I can't upgrade now, its a petabyte storage system. A little hard to keep a copy of something like that.
On Thu, Feb 13, 2014 at 3:20 PM, Jack Levin <[email protected]> wrote: > Can upgrade now but I would take suggestions on how to deal with this > On Feb 13, 2014 2:02 PM, "Stack" <[email protected]> wrote: > >> Can you upgrade Jack? This stuff is better in later versions (dfsclient >> keeps running list of bad datanodes...) >> St.Ack >> >> >> On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <[email protected]> wrote: >> >> > As far as I can tell I am hitting this issue: >> > >> > >> > >> http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%[email protected]%[email protected]@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u >> > >> > >> > 1581 < >> > >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581 >> > > >> > // search cached blocks first >> > >> > 1582 < >> > >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582 >> > > >> > *int* targetBlockIdx = locatedBlocks >> > < >> > >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks >> > >.findBlock >> > < >> > >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29 >> > >(offset); >> > >> > 1583 < >> > >> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583 >> > > >> > *if* (targetBlockIdx < 0) { // block is not cached >> > >> > >> > Our RS DFSClient is asking for a block on a dead datanode because the >> > block is somehow cached in DDFClient. It seems that after DN dies, >> > DFSClients in 90.5v of HBase do not drop the cache reference where >> > those blocks are. Seems like a problem. It would be good if there >> > was an ability for that cache to expire because our dead DN was down >> > since Sunday. >> > >> > >> > -Jack >> > >> > >> > >> > >> > On Thu, Feb 13, 2014 at 11:23 AM, Stack <[email protected]> wrote: >> > >> > > RS opens files and then keeps them open as long as the RS is alive. >> > We're >> > > failing read of this replica and then we succeed getting the block >> > > elsewhere? You get that exception every time? What hadoop version >> Jack? >> > > You have short-circuit reads on? >> > > St.Ack >> > > >> > > >> > > On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <[email protected]> >> wrote: >> > > >> > > > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / >> > shows >> > > > no issues. >> > > > >> > > > >> > > > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <[email protected]> >> > wrote: >> > > > >> > > > > Good morning -- >> > > > > I had a question, we have had a datanode go down, and its been >> down >> > for >> > > > > few days, however hbase is trying to talk to that dead datanode >> still >> > > > > 2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: >> > Failed >> > > to >> > > > > connect to /10.101.5.5:50010 for file >> > > > > >> /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544 >> > > for >> > > > > block 805865 >> > > > > >> > > > > so, question is, how come RS trying to talk to dead datanode, its >> on >> > in >> > > > > HDFS list even. >> > > > > >> > > > > Isn't the RS is just HDFS client? And it should not talk to >> offlined >> > > > HDFS >> > > > > datanode that went down? This caused a lot of issues in our >> cluster. >> > > > > >> > > > > Thanks, >> > > > > -Jack >> > > > > >> > > > >> > > >> > >> >
