Thanks Yi, I will look into HDFS-4516.
2014-09-10 15:03 GMT+08:00 Liu, Yi A <[email protected]>: > Hi Zesheng, > > > > I got from an offline email of you and knew your Hadoop version was > 2.0.0-alpha and you also said “The block is allocated successfully in NN, > but isn’t created in DN”. > > Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is > similar with HDFS-4516. And can you try Hadoop 2.4 or later, you should > not be able to re-produce it for these versions. > > > > From your description, the second block is created successfully and NN > would flush the edit log info to shared journal and shared storage might > persist the info, but before reporting back in rpc, there might be timeout > to NN from shared storage. So the block exist in shared edit log, but DN > doesn’t create it in anyway. On restart, client could fail, because in > that Hadoop version, client would retry only in the case of NN last block > size reported as non-zero if it was synced (see more in HDFS-4516). > > > > Regards, > > Yi Liu > > > > *From:* Zesheng Wu [mailto:[email protected]] > *Sent:* Tuesday, September 09, 2014 6:16 PM > *To:* [email protected] > *Subject:* HDFS: Couldn't obtain the locations of the last block > > > > Hi, > > > > These days we encountered a critical bug in HDFS which can result in HBase > can't start normally. > > The scenario is like following: > > 1. rs1 writes data to HDFS file f1, and the first block is written > successfully > > 2. rs1 apply to create the second block successfully, at this time, > nn1(ann) is crashed due to writing journal timeout > > 3. nn2(snn) isn't become active because of zkfc2 is in abnormal state > > 4. nn1 is restarted and becomes active > > 5. During the process of nn1 restarting, rs1 is crashed due to writing to > safemode nn(nn1) > > 6. As a result, the file f1 is in abnormal state and the HBase cluster > can't serve any more > > > > We can use the command line shell to list the file, look like following: > > -rw------- 3 hbase_srv supergroup 134217728 2014-09-05 11:32 > /hbase/lgsrv-push/xxx > > But when we try to download the file from hdfs, the dfs client complains: > > 14/09/09 18:12:11 WARN hdfs.DFSClient: Last block locations not available. > Datanodes might not have reported blocks completely. Will retry for 3 times > > 14/09/09 18:12:15 WARN hdfs.DFSClient: Last block locations not available. > Datanodes might not have reported blocks completely. Will retry for 2 times > > 14/09/09 18:12:19 WARN hdfs.DFSClient: Last block locations not available. > Datanodes might not have reported blocks completely. Will retry for 1 times > > get: Could not obtain the last block locations. > > Anyone can help on this? > > -- > Best Wishes! > > Yours, Zesheng > -- Best Wishes! Yours, Zesheng
