[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446042#comment-16446042 ]
Eric Yang commented on YARN-7939: --------------------------------- [~csingh] Thank you for the patch. I am getting this error message in serviceam.log: {code} 2018-04-20 16:50:14,290 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002] Transitioned from READY to UPGRADING on UPGRADE event 2018-04-20 16:50:14,309 [pool-6-thread-3] INFO provider.ProviderUtils - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002]: Creating dir on hdfs: hdfs://eyang-1.openstacklocal:9000/user/hbase/.yarn/services/abc/components/v2/ping/ping-0 2018-04-20 16:50:14,327 [pool-6-thread-3] INFO containerlaunch.ContainerLaunchService - reInitializing container container_1524242413029_0001_01_000002 2018-04-20 16:50:14,331 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2] INFO impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER for Container container_1524242413029_0001_01_000002 2018-04-20 16:50:14,379 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002] Transitioned from UPGRADING to READY on BECOME_READY event 2018-04-20 16:50:16,726 [AMRM Callback Handler Thread] WARN service.ServiceScheduler - Nodes updated info: eyang-4.openstacklocal:52497, state = UNHEALTHY, healthDiagnostics = Linux Container Executor reached unrecoverable exception 2018-04-20 16:50:16,730 [Component dispatcher] INFO component.Component - [COMPONENT ping] Requesting for 1 container(s) 2018-04-20 16:50:16,730 [Component dispatcher] INFO component.Component - [COMPONENT ping] Submitting container request : Capability[<memory:256, vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null] 2018-04-20 16:50:16,734 [pool-5-thread-1] INFO registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002]: Deleting registry path /users/hbase/services/yarn-service/abc/components/ctr-1524242413029-0001-01-000002 2018-04-20 16:50:16,736 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002]: container_1524242413029_0001_01_000002 completed. Reinsert back to pending list and requested a new container. exitStatus=-100, diagnostics=Container released on a *lost* node. 2018-04-20 16:50:16,736 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002] Transitioned from READY to INIT on STOP event 2018-04-20 16:50:16,739 [Component dispatcher] ERROR component.Component - [COMPONENT ping]: Invalid event CHECK_STABLE at UPGRADING org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: CHECK_STABLE at UPGRADING at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.service.component.Component.handle(Component.java:839) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) 2018-04-20 16:50:18,746 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - 1 containers allocated. 2018-04-20 16:50:18,746 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - [COMPONENT ping]: remove 1 outstanding container requests for allocateId 0 2018-04-20 16:50:18,747 [Component dispatcher] ERROR component.Component - [COMPONENT ping]: Invalid event CONTAINER_ALLOCATED at UPGRADING org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: CONTAINER_ALLOCATED at UPGRADING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.service.component.Component.handle(Component.java:839) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) {code} It looks like there is a problem to transition from UPGRADING to STABLE because container relaunch fail to work. I think this is blocked by the relaunch logic introduced in YARN-7973. Node manager log looks like this: {code} 2018-04-20 16:50:14,368 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase OPERATION=Container ReInitialization - Started TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1524242413029_0001 CONTAINERID=container_1524242413029_0001_01_000002 2018-04-20 16:50:14,369 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from RUNNING to REINITIALIZING_AWAITING_KILL 2018-04-20 16:50:14,369 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1524242413029_0001_01_000002 2018-04-20 16:50:14,391 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 143. Privileged Execution Operation Stderr: Stdout: main : command provided 1 main : run as user is hbase main : requested yarn user is hbase Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... Full command array for failed execution: [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, application_1524242413029_0001, container_1524242413029_0001_01_000002, /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.tokens, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid, /tmp/hadoop-yarn/nm-local-dir, /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none] 2018-04-20 16:50:14,393 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ExitCodeException exitCode=143: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 11 more 2018-04-20 16:50:14,396 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1524242413029_0001_01_000002 is : 143 2018-04-20 16:50:16,522 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Relaunching Container [container_1524242413029_0001_01_000002] for re-initialization !! 2018-04-20 16:50:16,523 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from REINITIALIZING_AWAITING_KILL to SCHEDULED 2018-04-20 16:50:16,523 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1524242413029_0001_01_000002] 2018-04-20 16:50:16,632 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase OPERATION=Container ReInitialization - Finished TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1524242413029_0001 CONTAINERID=container_1524242413029_0001_01_000002 2018-04-20 16:50:16,632 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from SCHEDULED to RUNNING 2018-04-20 16:50:16,632 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1524242413029_0001_01_000002 2018-04-20 16:50:16,641 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 33. Privileged Execution Operation Stderr: Could not create copy file 3 /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh Could not create local files and directories Stdout: main : command provided 1 main : run as user is hbase main : requested yarn user is hbase Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp Writing to cgroup task files... Creating local dirs... Can't open /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh for output - File exists Getting exit code file... Creating script paths... Full command array for failed execution: [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, application_1524242413029_0001, container_1524242413029_0001_01_000002, /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.tokens, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid, /tmp/hadoop-yarn/nm-local-dir, /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none] 2018-04-20 16:50:16,641 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=33: Could not create copy file 3 /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh Could not create local files and directories at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ExitCodeException exitCode=33: Could not create copy file 3 /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh Could not create local files and directories at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 11 more 2018-04-20 16:50:16,642 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1524242413029_0001_01_000002 is : 33 2018-04-20 16:50:16,642 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_1524242413029_0001_01_000002 and exit code: 33 org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1524242413029_0001_01_000002 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 33 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception message: Launch container failed 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error output: Could not create copy file 3 /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Could not create local files and directories 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main : command provided 1 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : run as user is hbase 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested yarn user is hbase 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Getting exit code file... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating script paths... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing pid file... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to tmp file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to cgroup task files... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating local dirs... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh for output - File exists 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Getting exit code file... 2018-04-20 16:50:16,643 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating script paths... 2018-04-20 16:50:16,645 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Failed to launch container due to configuration error. org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container Executor reached unrecoverable exception at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:631) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:571) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) ... 8 more 2018-04-20 16:50:16,647 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Relaunching Container container_1524242413029_0001_01_000002. Remaining retry attempts(after relaunch) : -1. Interval between retries is 30000ms 2018-04-20 16:50:16,649 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from RUNNING to RELAUNCHING 2018-04-20 16:50:17,661 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from RELAUNCHING to KILLING 2018-04-20 16:50:17,662 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1524242413029_0001_01_000002 2018-04-20 16:50:46,662 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2018-04-20 16:50:46,665 INFO org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting absolute path : /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002 2018-04-20 16:50:46,666 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1524242413029_0001 CONTAINERID=container_1524242413029_0001_01_000002 2018-04-20 16:50:46,671 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1524242413029_0001_01_000002 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE 2018-04-20 16:50:46,671 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1524242413029_0001_01_000002 from application application_1524242413029_0001 2018-04-20 16:50:46,671 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1524242413029_0001_01_000002 2018-04-20 16:50:46,672 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1524242413029_0001 2018-04-20 16:50:46,677 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1524242413029_0001_01_000002] {code} [~shaneku...@gmail.com] This looks like reinitialize does not work correctly with relaunch. [~csingh] Are you using latest trunk code? It seem like your testing environment does not have YARN-7973 applied. Can you verify? > Yarn Service Upgrade: add support to upgrade a component instance > ------------------------------------------------------------------ > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch, YARN-7939.004.patch, YARN-7939.005.patch, > YARN-7939.006.patch, YARN-7939.007.patch, YARN-7939.008.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org