Okay harsh : Your hint was enought to get me back on trakc! I found the linux container logs and they are Wonderful :)... I guess at the end of each container run, logs get propogated into the Distributed file system's /var/log directories.
In any case, once i dug in there, I found the cryptic failure was because my done_intermediate permissions were bad. anyways, thanks for the hint Harsh ! After monitoring the local /var/log/hadoop-yarn/container/ directory, i was able to see that the stdout/stderr files were being deleted , and then after some googling i found a post about how YARN aggregates logs into the DFS. Anyways, problem solved. For those curious: If debugging Yarn-linux-containers that are dying (as shown in [local] /var/log/hadoop-yarn/ nodemanager logs), you can dig more after the task dies by going into hadoop fs -cat /var/log/hadoop-yarn/apps/<oozie_user>/logs/application_1392385522708_0008/* On Fri, Feb 14, 2014 at 9:17 AM, German Florez-Larrahondo < [email protected]> wrote: > I believe that errors on containers are not propagated to the standard > "Java" logs. > > You have to look into the std* and syslog files of the container: > > > > Here is an example : > > > > > *.../userlogs/application_1391549207212_0006/container_1391549207212_0006_01_000027* > > > > [htf@gfldesktop container_1391549207212_0006_01_000027]$ ls -lart > > total 60 > > -rw-rw-r-- 1 htf htf 0 Feb 4 17:27 stdout > > -rw-rw-r-- 1 htf htf 0 Feb 4 17:27 stderr > > drwx--x--- 28 htf htf 4096 Feb 4 17:27 .. > > drwx--x--- 2 htf htf 4096 Feb 4 17:27 . > > -rw-rw-r-- 1 htf htf 50471 Feb 4 17:31 syslog > > > > Regards > > ./g > > > > -----Original Message----- > From: Jay Vyas [mailto:[email protected]] > Sent: Friday, February 14, 2014 7:02 AM > To: [email protected] > Cc: <[email protected]> > Subject: Re: How to ascertain why LinuxContainer dies? > > > > Not sure where the containers dump standard out /error to? I figured it > would be propagated in the node manager logs if anywhere, right? > > > > Sent from my iPhone > > > > > On Feb 14, 2014, at 4:46 AM, Harsh J <[email protected]> wrote: > > > > > > Hi, > > > > > > Does your container command generate any stderr/stdout outputs that > > > you can check under the container's work directory after it fails? > > > > > >> On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas <[email protected]> wrote: > > >> I have a linux container that dies. The nodemanager logs only say: > > >> > > >> WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: > > >> Exception from container-launch : > > >> org.apache.hadoop.util.Shell$ExitCodeException: > > >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:202) > > >> at org.apache.hadoop.util.Shell.run(Shell.java:129) > > >> at > > >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: > > >> 322) > > >> at > > >> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.laun > > >> chContainer(LinuxContainerExecutor.java:230) > > >> at > > >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C > > >> ontainerLaunch.call(ContainerLaunch.java:242) > > >> at > > >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C > > >> ontainerLaunch.call(ContainerLaunch.java:68) > > >> at > > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec > > >> utor.java:886) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor > > >> .java:908) > > >> at java.lang.Thread.run(Thread.java:662) > > >> > > >> where can i find the root cause of the non-zero exit code ? > > >> > > >> -- > > >> Jay Vyas > > >> http://jayunit100.blogspot.com > > > > > > > > > > > > -- > > > Harsh J > -- Jay Vyas http://jayunit100.blogspot.com
