Thanks Yi, I will look into HDFS-4516.

2014-09-10 15:03 GMT+08:00 Liu, Yi A <[email protected]>:

>  Hi Zesheng,
>
>
>
> I got from an offline email of you and knew your Hadoop version was
> 2.0.0-alpha and you also said “The block is allocated successfully in NN,
> but isn’t created in DN”.
>
> Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is
> similar with HDFS-4516.   And can you try Hadoop 2.4 or later, you should
> not be able to re-produce it for these versions.
>
>
>
> From your description, the second block is created successfully and NN
> would flush the edit log info to shared journal and shared storage might
> persist the info, but before reporting back in rpc, there might be timeout
> to NN from shared storage.  So the block exist in shared edit log, but DN
> doesn’t create it in anyway.  On restart, client could fail, because in
> that Hadoop version, client would retry only in the case of NN last block
> size reported as non-zero if it was synced (see more in HDFS-4516).
>
>
>
> Regards,
>
> Yi Liu
>
>
>
> *From:* Zesheng Wu [mailto:[email protected]]
> *Sent:* Tuesday, September 09, 2014 6:16 PM
> *To:* [email protected]
> *Subject:* HDFS: Couldn't obtain the locations of the last block
>
>
>
> Hi,
>
>
>
> These days we encountered a critical bug in HDFS which can result in HBase
> can't start normally.
>
> The scenario is like following:
>
> 1.  rs1 writes data to HDFS file f1, and the first block is written
> successfully
>
> 2.  rs1 apply to create the second block successfully, at this time,
> nn1(ann) is crashed due to writing journal timeout
>
> 3. nn2(snn) isn't become active because of zkfc2 is in abnormal state
>
> 4. nn1 is restarted and becomes active
>
> 5. During the process of nn1 restarting, rs1 is crashed due to writing to
> safemode nn(nn1)
>
> 6. As a result, the file f1 is in abnormal state and the HBase cluster
> can't serve any more
>
>
>
> We can use the command line shell to list the file, look like following:
>
> -rw-------   3 hbase_srv supergroup  134217728 2014-09-05 11:32 
> /hbase/lgsrv-push/xxx
>
>  But when we try to download the file from hdfs, the dfs client complains:
>
> 14/09/09 18:12:11 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 3 times
>
> 14/09/09 18:12:15 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 2 times
>
> 14/09/09 18:12:19 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 1 times
>
> get: Could not obtain the last block locations.
>
> Anyone can help on this?
>
>  --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Reply via email to