[ https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471269#comment-16471269 ]
genericqa commented on YARN-8243: --------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 55s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8243 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922924/YARN-8243.02.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 589f60b31c85 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7369f41 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20692/testReport/ | | Max. process+thread count | 890 (vs. ulimit of 10000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20692/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Flex down should first remove pending container requests (if any) and then > kill running containers > -------------------------------------------------------------------------------------------------- > > Key: YARN-8243 > URL: https://issues.apache.org/jira/browse/YARN-8243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services > Affects Versions: 3.1.0 > Reporter: Gour Saha > Assignee: Gour Saha > Priority: Major > Attachments: YARN-8243.01.patch, YARN-8243.02.patch > > > This is easy to test on a service with anti-affinity component, to simulate > pending container requests. It can be simulated by other means also (no > resource left in cluster, etc.). > Service yarnfile used to test this - > {code:java} > { > "name": "sleeper-service", > "version": "1", > "components" : > [ > { > "name": "ping", > "number_of_containers": 2, > "resource": { > "cpus": 1, > "memory": "256" > }, > "launch_command": "sleep 9000", > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "ping" > ] > } > ] > } > } > ] > } > {code} > Launch a service with the above yarnfile as below - > {code:java} > yarn app -launch simple-aa-1 simple_AA.json > {code} > Let's assume there are only 5 nodes in this cluster. Now, flex the above > service to 1 extra container than the number of nodes (6 in my case). > {code:java} > yarn app -flex simple-aa-1 -component ping 6 > {code} > Only 5 containers will be allocated and running for simple-aa-1. At this > point, flex it down to 5 containers - > {code:java} > yarn app -flex simple-aa-1 -component ping 5 > {code} > This is what is seen in the serviceam log at this point - > {noformat} > 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO > service.ClientAMService - Flexing component ping to 5 > 2018-05-03 20:17:38,469 [Component dispatcher] INFO component.Component - > [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5 > 2018-05-03 20:17:38,470 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_000006]: Flexed down by user, destroying. > 2018-05-03 20:17:38,473 [Component dispatcher] INFO component.Component - > [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event. > 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_000006]: Deleting registry path > /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-000006 > 2018-05-03 20:17:38,476 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2018-05-03 20:17:38,480 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2018-05-03 20:17:38,578 [pool-5-thread-8] INFO instance.ComponentInstance - > [COMPINSTANCE ping-4 : container_1525297086734_0013_01_000006]: Deleted > component instance dir: > hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-4 > 2018-05-03 20:17:39,268 [AMRM Callback Handler Thread] WARN > service.ServiceScheduler - Container container_1525297086734_0013_01_000006 > Completed. No component instance exists. exitStatus=-100. > diagnostics=Container released by application > 2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO > service.ServiceScheduler - 1 containers allocated. > 2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO > service.ServiceScheduler - [COMPONENT ping]: remove 0 outstanding container > requests for allocateId 0 > 2018-05-03 20:17:40,274 [Component dispatcher] INFO component.Component - > [COMPONENT ping]: container_1525297086734_0013_01_000007 allocated, num > pending component instances reduced to 0 > 2018-05-03 20:17:40,274 [Component dispatcher] INFO component.Component - > [COMPONENT ping]: Assigned container_1525297086734_0013_01_000007 to > component instance ping-5 and launch on host > ctr-e138-1518143905142-280820-01-000008.example.site:25454 > 2018-05-03 20:17:40,277 [pool-6-thread-6] INFO provider.ProviderUtils - > [COMPINSTANCE ping-5 : container_1525297086734_0013_01_000007]: Creating dir > on hdfs: > hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-5 > 2018-05-03 20:17:40,316 [pool-6-thread-6] INFO > containerlaunch.ContainerLaunchService - launching container > container_1525297086734_0013_01_000007 > 2018-05-03 20:17:40,318 > [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #5] INFO > impl.NMClientAsyncImpl - Processing Event EventType: START_CONTAINER for > Container container_1525297086734_0013_01_000007 > 2018-05-03 20:17:40,338 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CONTAINER_STARTED at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CONTAINER_STARTED at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Status response shows that only 4 containers are running and the service is > not in STABLE state - > {code:java} > yarn app -status simple-aa-1 > {code} > output - > {code:java} > { > "components": [ > { > "configuration": { > "env": {}, > "files": [], > "properties": {} > }, > "containers": [ > { > "bare_host": > "ctr-e138-1518143905142-280820-01-000007.example.site", > "component_instance_name": "ping-1", > "hostname": > "ctr-e138-1518143905142-280820-01-000007.example.site", > "id": "container_1525297086734_0013_01_000003", > "ip": "x.x.x.x", > "launch_time": 1525378141535, > "state": "READY" > }, > { > "bare_host": > "ctr-e138-1518143905142-280820-01-000006.example.site", > "component_instance_name": "ping-0", > "hostname": > "ctr-e138-1518143905142-280820-01-000006.example.site", > "id": "container_1525297086734_0013_01_000002", > "ip": "x.x.x.x", > "launch_time": 1525378141513, > "state": "READY" > }, > { > "bare_host": > "ctr-e138-1518143905142-280820-01-000005.example.site", > "component_instance_name": "ping-3", > "hostname": > "ctr-e138-1518143905142-280820-01-000005.example.site", > "id": "container_1525297086734_0013_01_000005", > "ip": "x.x.x.x", > "launch_time": 1525378303429, > "state": "READY" > }, > { > "bare_host": > "ctr-e138-1518143905142-280820-01-000004.example.site", > "component_instance_name": "ping-2", > "hostname": > "ctr-e138-1518143905142-280820-01-000004.example.site", > "id": "container_1525297086734_0013_01_000004", > "ip": "x.x.x.x", > "launch_time": 1525378303425, > "state": "READY" > } > ], > "dependencies": [], > "launch_command": "sleep 9000", > "name": "ping", > "number_of_containers": 5, > "placement_policy": { > "constraints": [ > { > "node_attributes": {}, > "node_partitions": [], > "scope": "NODE", > "target_tags": [ > "ping" > ], > "type": "ANTI_AFFINITY" > } > ] > }, > "quicklinks": [], > "resource": { > "additional": {}, > "cpus": 1, > "memory": "256" > }, > "run_privileged_container": false, > "state": "FLEXING" > } > ], > "configuration": { > "env": {}, > "files": [], > "properties": {} > }, > "id": "application_1525297086734_0013", > "kerberos_principal": {}, > "lifetime": -1, > "name": "simple-aa-1", > "quicklinks": {}, > "state": "STARTED", > "version": "1" > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org