[
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175572#comment-15175572
]
Varun Vasudev commented on YARN-4744:
-------------------------------------
Actually, there are two warn statements that are logged. One is in
executePrivilegedOperation() in PrivilegedOperationExecutor and the second one
is in signalContainer() in DefaultLinuxContainerRuntime.
I'm unsure of how to handle this. My feeling is that the
PrivilegedOperationExecutor should log failures irrespective of the error code
but that the DefaultLinuxContainerRuntime shouldn't log the warning for invalid
pids(similar to what LinuxContainerExecutor used to do before the refactoring).
[~jlowe], [~vinodkv], [~rohithsharma] - what do you think?
> Too many signal to container failure in case of LCE
> ---------------------------------------------------
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.9.0
> Reporter: Bibin A Chundatt
> Assignee: Sidharta Seethana
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf
> Too many signal to container failure
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
> Authorization successful for testing (auth:TOKEN) for protocol=interface
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_e02_1393731146548_0001_01_000013
> 2014-03-02 09:20:43,071 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Container container_e02_1393731146548_0001_01_000009 succeeded
> 2014-03-02 09:20:43,072 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_e02_1393731146548_0001_01_000009 transitioned from
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_e02_1393731146548_0001_01_000009
> 2014-03-02 09:20:43,075 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
> Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
> Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
> yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
> Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
> ExitCodeException exitCode=9:
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn
> OPERATION=Container Finished - Succeeded TARGET=ContainerImpl
> RESULT=SUCCESS APPID=application_1393731146548_0001
> CONTAINERID=container_e02_1393731146548_0001_01_000009
> 2014-03-02 09:20:43,115 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_e02_1393731146548_0001_01_000009 transitioned from
> EXITED_WITH_SUCCESS to DONE
> 2014-03-02 09:20:43,115 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Removing container_e02_1393731146548_0001_01_000009 from application
> application_1393731146548_0001
> {noformat}
> Checked the same scenario in 2.7.2 version (not available)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)