[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450448#comment-16450448
 ] 

Eric Yang commented on YARN-7939:
---------------------------------

[~csingh] Thank you for the patch.  12 of the 39 checkstyle issues can be 
addressed to improve readability.

When try to upgrade an instance, service AM log shows:

{code}
2018-04-24 19:03:36,353 [IPC Server handler 0 on 55808] INFO  
service.ClientAMService - Upgrade container 
container_1524594485317_0002_01_000002
2018-04-24 19:03:36,354 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524594485317_0002_01_000002] Transitioned from READY to UPGRADING on 
UPGRADE event
2018-04-24 19:03:36,360 [pool-6-thread-3] INFO  provider.ProviderUtils - 
[COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002]: Creating dir on 
hdfs: 
hdfs://eyang-1.openstacklocal:9000/user/hbase/.yarn/services/abc/components/v2/ping/ping-0
2018-04-24 19:03:36,378 [pool-6-thread-3] INFO  
containerlaunch.ContainerLaunchService - reInitializing container 
container_1524594485317_0002_01_000002
2018-04-24 19:03:36,383 
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2] INFO  
impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER for 
Container container_1524594485317_0002_01_000002
2018-04-24 19:03:36,433 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524594485317_0002_01_000002] Transitioned from UPGRADING to READY on 
BECOME_READY event
2018-04-24 19:03:39,141 [AMRM Callback Handler Thread] WARN  
service.ServiceScheduler - Nodes updated info: 
eyang-4.openstacklocal:41636, state = UNHEALTHY, healthDiagnostics = Linux 
Container Executor reached unrecoverable exception

2018-04-24 19:03:39,143 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Transitioned from UPGRADING to FLEXING on CONTAINER_COMPLETED 
event.
2018-04-24 19:03:39,144 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Requesting for 1 container(s)
2018-04-24 19:03:39,144 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Submitting container request : Capability[<memory:256, 
vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution 
Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null]
2018-04-24 19:03:39,147 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524594485317_0002_01_000002]: container_1524594485317_0002_01_000002 
completed. Reinsert back to pending list and requested a new container.
 exitStatus=-100, diagnostics=Container released on a *lost* node.
2018-04-24 19:03:39,147 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-0 : 
container_1524594485317_0002_01_000002] Transitioned from READY to INIT on STOP 
event
2018-04-24 19:03:39,147 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Transitioned from FLEXING to UPGRADING on CHECK_STABLE event.
2018-04-24 19:03:39,147 [pool-5-thread-5] INFO  
registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-0 : 
container_1524594485317_0002_01_000002]: Deleting registry path 
/users/hbase/services/yarn-service/abc/components/ctr-1524594485317-0002-01-000002
2018-04-24 19:03:41,151 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - 1 containers allocated. 
2018-04-24 19:03:41,151 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - [COMPONENT ping]: remove 1 outstanding container 
requests for allocateId 0
2018-04-24 19:03:41,153 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CONTAINER_ALLOCATED at UPGRADING
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_ALLOCATED at UPGRADING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:838)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:748)
{code}

This seems to be an error handling issue for counting number of containers that 
are in ready state before transition from flexing state.


> Yarn Service Upgrade: add support to upgrade a component instance 
> ------------------------------------------------------------------
>
>                 Key: YARN-7939
>                 URL: https://issues.apache.org/jira/browse/YARN-7939
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>         Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch, YARN-7939.004.patch, YARN-7939.005.patch, 
> YARN-7939.006.patch, YARN-7939.007.patch, YARN-7939.008.patch, 
> YARN-7939.009.patch, serviceam.log, upgrade_logs.tgz
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to