Hi Prabhu, Thanks for the clarification. It looks like this is a configuration issue. Why do you configure "yarn.nodemanager.local-dirs" as /tmp/nm-local-dir?
thanks zhihai On Tue, Jan 12, 2016 at 8:55 PM, Prabhu Joseph <[email protected]> wrote: > Thanks Zhihai for your comment. > > The actual issue is a container failed during localization because of > /tmp/nm-local-dir removed by tmpwatch and hence the subsequent containers > of that job running in that Node are hanging at LOCALIZING state. In > hadoop-2.7.0, there was a fix made by removing the unnecessary files > created by the failed container and hence the subsequent containers are > working fine. Want to find the YARN JIRA which fixed this. There are many > related YARN JIRA's for Localization but could not able to find the exact > one. > > Thanks, > Prabhu Josepj > > On Tue, Jan 12, 2016 at 10:01 PM, Zhihai Xu <[email protected]> > wrote: > > > Hi Prabhu, > > > > I saw some similar localization timeout issue. I found the localization > > timeout issue is due to HDFS not YARN. > > In my case, HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005> > > fixed > > the issue. HDFS-7005 <https://issues.apache.org/jira/browse/HDFS-7005> > is > > only in 2.6 or later release. > > The root cause is all public localizer threads stuck on reading file data > > from HDFS. > > Maybe you can try HDFS-7005 to see whether it can fix your issue. > > > > Regards > > zhihai > > > > On Tue, Jan 12, 2016 at 2:41 AM, Prabhu Joseph < > [email protected] > > > > > wrote: > > > > > Hi Experts, > > > > > > On hadoop-2.5.1, When Localization is failed for a container of a > job > > in > > > a NodeManager at > > > > > > > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer, > > > then the subsequent containers of that job submitted into that > > NodeManager > > > hangs at Localizing state until the task times out. > > > > > > On hadoop-2.7.0, the above behavior is fixed, by creating another > > Localizer > > > for the job in the NodeManager when the previous container fails at > > > Localization. > > > > > > Can someone share me the YARN JIRA which fixed the above issue in > > > hadoop-2.7.0. > > > > > > > > > Thanks, > > > Prabhu Joseph > > > > > >
