Hi Prabhu, I saw some similar localization timeout issue. I found the localization timeout issue is due to HDFS not YARN. In my case, HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005> fixed the issue. HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005> is only in 2.6 or later release. The root cause is all public localizer threads stuck on reading file data from HDFS. Maybe you can try HDFS-7005 to see whether it can fix your issue.
Regards zhihai On Tue, Jan 12, 2016 at 2:41 AM, Prabhu Joseph <[email protected]> wrote: > Hi Experts, > > On hadoop-2.5.1, When Localization is failed for a container of a job in > a NodeManager at > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer, > then the subsequent containers of that job submitted into that NodeManager > hangs at Localizing state until the task times out. > > On hadoop-2.7.0, the above behavior is fixed, by creating another Localizer > for the job in the NodeManager when the previous container fails at > Localization. > > Can someone share me the YARN JIRA which fixed the above issue in > hadoop-2.7.0. > > > Thanks, > Prabhu Joseph >
