[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450448#comment-16450448 ]
Eric Yang commented on YARN-7939: --------------------------------- [~csingh] Thank you for the patch. 12 of the 39 checkstyle issues can be addressed to improve readability. When try to upgrade an instance, service AM log shows: {code} 2018-04-24 19:03:36,353 [IPC Server handler 0 on 55808] INFO service.ClientAMService - Upgrade container container_1524594485317_0002_01_000002 2018-04-24 19:03:36,354 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002] Transitioned from READY to UPGRADING on UPGRADE event 2018-04-24 19:03:36,360 [pool-6-thread-3] INFO provider.ProviderUtils - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002]: Creating dir on hdfs: hdfs://eyang-1.openstacklocal:9000/user/hbase/.yarn/services/abc/components/v2/ping/ping-0 2018-04-24 19:03:36,378 [pool-6-thread-3] INFO containerlaunch.ContainerLaunchService - reInitializing container container_1524594485317_0002_01_000002 2018-04-24 19:03:36,383 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #2] INFO impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER for Container container_1524594485317_0002_01_000002 2018-04-24 19:03:36,433 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002] Transitioned from UPGRADING to READY on BECOME_READY event 2018-04-24 19:03:39,141 [AMRM Callback Handler Thread] WARN service.ServiceScheduler - Nodes updated info: eyang-4.openstacklocal:41636, state = UNHEALTHY, healthDiagnostics = Linux Container Executor reached unrecoverable exception 2018-04-24 19:03:39,143 [Component dispatcher] INFO component.Component - [COMPONENT ping] Transitioned from UPGRADING to FLEXING on CONTAINER_COMPLETED event. 2018-04-24 19:03:39,144 [Component dispatcher] INFO component.Component - [COMPONENT ping] Requesting for 1 container(s) 2018-04-24 19:03:39,144 [Component dispatcher] INFO component.Component - [COMPONENT ping] Submitting container request : Capability[<memory:256, vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[null] 2018-04-24 19:03:39,147 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002]: container_1524594485317_0002_01_000002 completed. Reinsert back to pending list and requested a new container. exitStatus=-100, diagnostics=Container released on a *lost* node. 2018-04-24 19:03:39,147 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002] Transitioned from READY to INIT on STOP event 2018-04-24 19:03:39,147 [Component dispatcher] INFO component.Component - [COMPONENT ping] Transitioned from FLEXING to UPGRADING on CHECK_STABLE event. 2018-04-24 19:03:39,147 [pool-5-thread-5] INFO registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-0 : container_1524594485317_0002_01_000002]: Deleting registry path /users/hbase/services/yarn-service/abc/components/ctr-1524594485317-0002-01-000002 2018-04-24 19:03:41,151 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - 1 containers allocated. 2018-04-24 19:03:41,151 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - [COMPONENT ping]: remove 1 outstanding container requests for allocateId 0 2018-04-24 19:03:41,153 [Component dispatcher] ERROR component.Component - [COMPONENT ping]: Invalid event CONTAINER_ALLOCATED at UPGRADING org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: CONTAINER_ALLOCATED at UPGRADING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.service.component.Component.handle(Component.java:838) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:573) at org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:562) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) {code} This seems to be an error handling issue for counting number of containers that are in ready state before transition from flexing state. > Yarn Service Upgrade: add support to upgrade a component instance > ------------------------------------------------------------------ > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch, YARN-7939.004.patch, YARN-7939.005.patch, > YARN-7939.006.patch, YARN-7939.007.patch, YARN-7939.008.patch, > YARN-7939.009.patch, serviceam.log, upgrade_logs.tgz > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org