[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446042#comment-16446042
 ] 

Eric Yang commented on YARN-7939:
---------------------------------

[~csingh] Thank you for the patch.  I am getting this error message in 
serviceam.log:

{code}
2018-04-20 16:50:14,290 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524242413029_0001_01_000002] Transitioned from READY to UPGRADING on 
UPGRADE event
2018-04-20 16:50:14,309 [pool-6-thread-3] INFO  provider.ProviderUtils - 
[COMPINSTANCE ping-0 : container_1524242413029_0001_01_000002]: Creating dir on 
hdfs: 
hdfs://eyang-1.openstacklocal:9000/user/hbase/.yarn/services/abc/components/v2/ping/ping-0
2018-04-20 16:50:14,327 [pool-6-thread-3] INFO  
containerlaunch.ContainerLaunchService - reInitializing container 
container_1524242413029_0001_01_000002
2018-04-20 16:50:14,331 
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2] INFO  
impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER for 
Container container_1524242413029_0001_01_000002
2018-04-20 16:50:14,379 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524242413029_0001_01_000002] Transitioned from UPGRADING to READY on 
BECOME_READY event
2018-04-20 16:50:16,726 [AMRM Callback Handler Thread] WARN  
service.ServiceScheduler - Nodes updated info: 
eyang-4.openstacklocal:52497, state = UNHEALTHY, healthDiagnostics = Linux 
Container Executor reached unrecoverable exception

2018-04-20 16:50:16,730 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Requesting for 1 container(s)
2018-04-20 16:50:16,730 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Submitting container request : Capability[<memory:256, 
vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution 
Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null]
2018-04-20 16:50:16,734 [pool-5-thread-1] INFO  
registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-0 : 
container_1524242413029_0001_01_000002]: Deleting registry path 
/users/hbase/services/yarn-service/abc/components/ctr-1524242413029-0001-01-000002
2018-04-20 16:50:16,736 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524242413029_0001_01_000002]: container_1524242413029_0001_01_000002 
completed. Reinsert back to pending list and requested a new container.
 exitStatus=-100, diagnostics=Container released on a *lost* node.
2018-04-20 16:50:16,736 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524242413029_0001_01_000002] Transitioned from READY to INIT on STOP 
event
2018-04-20 16:50:16,739 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CHECK_STABLE at UPGRADING
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CHECK_STABLE at UPGRADING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:839)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:748)
2018-04-20 16:50:18,746 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - 1 containers allocated. 
2018-04-20 16:50:18,746 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - [COMPONENT ping]: remove 1 outstanding container 
requests for allocateId 0
2018-04-20 16:50:18,747 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CONTAINER_ALLOCATED at UPGRADING
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_ALLOCATED at UPGRADING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:839)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:748)
{code}

It looks like there is a problem to transition from UPGRADING to STABLE because 
container relaunch fail to work.  I think this is blocked by the relaunch logic 
introduced in YARN-7973.  Node manager log looks like this:

{code}
2018-04-20 16:50:14,368 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase        
OPERATION=Container ReInitialization - Started  TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1524242413029_0001    
CONTAINERID=container_1524242413029_0001_01_000002
2018-04-20 16:50:14,369 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KILL
2018-04-20 16:50:14,369 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1524242413029_0001_01_000002
2018-04-20 16:50:14,391 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 143. Privileged Execution Operation Stderr:

Stdout: main : command provided 1
main : run as user is hbase
main : requested yarn user is hbase
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Full command array for failed execution:
[/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, 
application_1524242413029_0001, container_1524242413029_0001_01_000002, 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.tokens,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid,
 /tmp/hadoop-yarn/nm-local-dir, /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, 
cgroups=none]
2018-04-20 16:50:14,393 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Launch container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=143:
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=143:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
        at org.apache.hadoop.util.Shell.run(Shell.java:902)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
        ... 11 more
2018-04-20 16:50:14,396 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1524242413029_0001_01_000002 is : 143
2018-04-20 16:50:16,522 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Relaunching Container [container_1524242413029_0001_01_000002] for 
re-initialization !!
2018-04-20 16:50:16,523 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from 
REINITIALIZING_AWAITING_KILL to SCHEDULED
2018-04-20 16:50:16,523 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
 Starting container [container_1524242413029_0001_01_000002]
2018-04-20 16:50:16,632 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase        
OPERATION=Container ReInitialization - Finished TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1524242413029_0001    
CONTAINERID=container_1524242413029_0001_01_000002
2018-04-20 16:50:16,632 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from SCHEDULED 
to RUNNING
2018-04-20 16:50:16,632 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Starting resource-monitoring for container_1524242413029_0001_01_000002
2018-04-20 16:50:16,641 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 33. Privileged Execution Operation Stderr:
Could not create copy file 3 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
Could not create local files and directories

Stdout: main : command provided 1
main : run as user is hbase
main : requested yarn user is hbase
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Can't open 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
 for output - File exists
Getting exit code file...
Creating script paths...

Full command array for failed execution:
[/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, 
application_1524242413029_0001, container_1524242413029_0001_01_000002, 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.tokens,
 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid,
 /tmp/hadoop-yarn/nm-local-dir, /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, 
cgroups=none]
2018-04-20 16:50:16,641 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Launch container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=33: Could not create copy file 3 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
Could not create local files and directories

        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=33: Could not create copy file 3 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
Could not create local files and directories

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
        at org.apache.hadoop.util.Shell.run(Shell.java:902)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
        ... 11 more
2018-04-20 16:50:16,642 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1524242413029_0001_01_000002 is : 33
2018-04-20 16:50:16,642 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_1524242413029_0001_01_000002 
and exit code: 33
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Launch container failed
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1524242413029_0001_01_000002
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 33
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception message: 
Launch container failed
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
output: Could not create copy file 3 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Could not create 
local files and directories
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main 
: command provided 1
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : run as user 
is hbase
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested 
yarn user is hbase
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Getting exit code 
file...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating script 
paths...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing pid file...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to tmp 
file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_000002/container_1524242413029_0001_01_000002.pid.tmp
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Writing to cgroup 
task files...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating local 
dirs...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't open 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002/launch_container.sh
 for output - File exists
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Getting exit code 
file...
2018-04-20 16:50:16,643 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Creating script 
paths...
2018-04-20 16:50:16,645 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Failed to launch container due to configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container 
Executor reached unrecoverable exception
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:631)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:571)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Launch container failed
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
        ... 8 more
2018-04-20 16:50:16,647 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Relaunching Container container_1524242413029_0001_01_000002. Remaining retry 
attempts(after relaunch) : -1. Interval between retries is 30000ms
2018-04-20 16:50:16,649 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from RUNNING to 
RELAUNCHING
2018-04-20 16:50:17,661 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from RELAUNCHING 
to KILLING
2018-04-20 16:50:17,662 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1524242413029_0001_01_000002
2018-04-20 16:50:46,662 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
2018-04-20 16:50:46,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_000002
2018-04-20 16:50:46,666 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase        
OPERATION=Container Finished - Killed   TARGET=ContainerImpl    RESULT=SUCCESS  
APPID=application_1524242413029_0001    
CONTAINERID=container_1524242413029_0001_01_000002
2018-04-20 16:50:46,671 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1524242413029_0001_01_000002 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2018-04-20 16:50:46,671 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Removing container_1524242413029_0001_01_000002 from application 
application_1524242413029_0001
2018-04-20 16:50:46,671 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Stopping resource-monitoring for container_1524242413029_0001_01_000002
2018-04-20 16:50:46,672 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_STOP for appId application_1524242413029_0001
2018-04-20 16:50:46,677 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
completed containers from NM context: [container_1524242413029_0001_01_000002]
{code}

[~shaneku...@gmail.com] This looks like reinitialize does not work correctly 
with relaunch.  [~csingh] Are you using latest trunk code?  It seem like your 
testing environment does not have YARN-7973 applied.  Can you verify?

> Yarn Service Upgrade: add support to upgrade a component instance 
> ------------------------------------------------------------------
>
>                 Key: YARN-7939
>                 URL: https://issues.apache.org/jira/browse/YARN-7939
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>         Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch, YARN-7939.004.patch, YARN-7939.005.patch, 
> YARN-7939.006.patch, YARN-7939.007.patch, YARN-7939.008.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to