[ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8243:
----------------------------
    Description: 
This is easy to test on a service with anti-affinity component, to simulate 
pending container requests. It can be simulated by other means also (no 
resource left in cluster, etc.).

Service yarnfile used to test this -
{code:java}
{
  "name": "sleeper-service",
  "version": "1",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "launch_command": "sleep 9000",
      "placement_policy": {
        "constraints": [
          {
            "type": "ANTI_AFFINITY",
            "scope": "NODE",
            "target_tags": [
              "ping"
            ]
          }
        ]
      }
    }
  ]
}
{code}
Launch a service with the above yarnfile as below -
{code:java}
yarn app -launch simple-aa-1 simple_AA.json
{code}
Let's assume there are only 5 nodes in this cluster. Now, flex the above 
service to 1 extra container than the number of nodes (6 in my case).
{code:java}
yarn app -flex simple-aa-1 -component ping 6
{code}
Only 5 containers will be allocated and running for simple-aa-1. At this point, 
flex it down to 5 containers -
{code:java}
yarn app -flex simple-aa-1 -component ping 5
{code}
This is what is seen in the serviceam log at this point -
{noformat}
2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
service.ClientAMService - Flexing component ping to 5
2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
[FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-4 : 
container_1525297086734_0013_01_000006]: Flexed down by user, destroying.
2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
container_1525297086734_0013_01_000006]: Deleting registry path 
/users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-000006
2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CHECK_STABLE at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CHECK_STABLE at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2018-05-03 20:17:38,578 [pool-5-thread-8] INFO  instance.ComponentInstance - 
[COMPINSTANCE ping-4 : container_1525297086734_0013_01_000006]: Deleted 
component instance dir: 
hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-4
2018-05-03 20:17:39,268 [AMRM Callback Handler Thread] WARN  
service.ServiceScheduler - Container container_1525297086734_0013_01_000006 
Completed. No component instance exists. exitStatus=-100. diagnostics=Container 
released by application 
2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - 1 containers allocated. 
2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - [COMPONENT ping]: remove 0 outstanding container 
requests for allocateId 0
2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping]: container_1525297086734_0013_01_000007 allocated, num pending 
component instances reduced to 0
2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping]: Assigned container_1525297086734_0013_01_000007 to component 
instance ping-5 and launch on host 
ctr-e138-1518143905142-280820-01-000008.example.site:25454 
2018-05-03 20:17:40,277 [pool-6-thread-6] INFO  provider.ProviderUtils - 
[COMPINSTANCE ping-5 : container_1525297086734_0013_01_000007]: Creating dir on 
hdfs: 
hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-5
2018-05-03 20:17:40,316 [pool-6-thread-6] INFO  
containerlaunch.ContainerLaunchService - launching container 
container_1525297086734_0013_01_000007
2018-05-03 20:17:40,318 
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #5] INFO  
impl.NMClientAsyncImpl - Processing Event EventType: START_CONTAINER for 
Container container_1525297086734_0013_01_000007
2018-05-03 20:17:40,338 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CONTAINER_STARTED at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_STARTED at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
{noformat}
Status response shows that only 4 containers are running and the service is not 
in STABLE state -
{code:java}
yarn app -status simple-aa-1
{code}
output -
{code:java}
{
    "components": [
        {
            "configuration": {
                "env": {},
                "files": [],
                "properties": {}
            },
            "containers": [
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000007.example.site",
                    "component_instance_name": "ping-1",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000007.example.site",
                    "id": "container_1525297086734_0013_01_000003",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378141535,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000006.example.site",
                    "component_instance_name": "ping-0",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000006.example.site",
                    "id": "container_1525297086734_0013_01_000002",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378141513,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000005.example.site",
                    "component_instance_name": "ping-3",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000005.example.site",
                    "id": "container_1525297086734_0013_01_000005",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378303429,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000004.example.site",
                    "component_instance_name": "ping-2",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000004.example.site",
                    "id": "container_1525297086734_0013_01_000004",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378303425,
                    "state": "READY"
                }
            ],
            "dependencies": [],
            "launch_command": "sleep 9000",
            "name": "ping",
            "number_of_containers": 5,
            "placement_policy": {
                "constraints": [
                    {
                        "node_attributes": {},
                        "node_partitions": [],
                        "scope": "NODE",
                        "target_tags": [
                            "ping"
                        ],
                        "type": "ANTI_AFFINITY"
                    }
                ]
            },
            "quicklinks": [],
            "resource": {
                "additional": {},
                "cpus": 1,
                "memory": "256"
            },
            "run_privileged_container": false,
            "state": "FLEXING"
        }
    ],
    "configuration": {
        "env": {},
        "files": [],
        "properties": {}
    },
    "id": "application_1525297086734_0013",
    "kerberos_principal": {},
    "lifetime": -1,
    "name": "simple-aa-1",
    "quicklinks": {},
    "state": "STARTED",
    "version": "1"
}
{code}

  was:
This is easy to test on a service with anti-affinity component, to simulate 
pending container requests. It can be simulated by other means also (no 
resource left in cluster, etc.).

Service yarnfile used to test this -
{code:java}
{
  "name": "sleeper-service",
  "version": "1",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "launch_command": "sleep 9000",
      "placement_policy": {
        "constraints": [
          {
            "type": "ANTI_AFFINITY",
            "scope": "NODE",
            "target_tags": [
              "ping"
            ]
          }
        ]
      }
    }
  ]
}
{code}
Launch a service with the above yarnfile as below -
{code:java}
yarn app -launch simple-aa-1 simple_AA.json
{code}
Let's assume there are only 5 nodes in this cluster. Now, flex the above 
service to 1 extra container than the number of nodes (6 in my case).
{code:java}
yarn app -flex simple-aa-1 -component ping 6
{code}
Only 5 containers will be allocated and running for simple-aa-1. At this point, 
flex it down to 5 containers -
{code:java}
yarn app -flex simple-aa-1 -component ping 5
{code}
This is what is seen in the serviceam log at this point -
{code:java}
2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
service.ClientAMService - Flexing component ping to 5
2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
[FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE ping-4 : 
container_1525297086734_0013_01_000006]: Flexed down by user, destroying.
2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
container_1525297086734_0013_01_000006]: Deleting registry path 
/users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-000006
2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CHECK_STABLE at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CHECK_STABLE at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
2018-05-03 20:17:38,578 [pool-5-thread-8] INFO  instance.ComponentInstance - 
[COMPINSTANCE ping-4 : container_1525297086734_0013_01_000006]: Deleted 
component instance dir: 
hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-4
2018-05-03 20:17:39,268 [AMRM Callback Handler Thread] WARN  
service.ServiceScheduler - Container container_1525297086734_0013_01_000006 
Completed. No component instance exists. exitStatus=-100. diagnostics=Container 
released by application 
2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - 1 containers allocated. 
2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - [COMPONENT ping]: remove 0 outstanding container 
requests for allocateId 0
2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping]: container_1525297086734_0013_01_000007 allocated, num pending 
component instances reduced to 0
2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
[COMPONENT ping]: Assigned container_1525297086734_0013_01_000007 to component 
instance ping-5 and launch on host 
ctr-e138-1518143905142-280820-01-000008.example.site:25454 
2018-05-03 20:17:40,277 [pool-6-thread-6] INFO  provider.ProviderUtils - 
[COMPINSTANCE ping-5 : container_1525297086734_0013_01_000007]: Creating dir on 
hdfs: 
hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-5
2018-05-03 20:17:40,316 [pool-6-thread-6] INFO  
containerlaunch.ContainerLaunchService - launching container 
container_1525297086734_0013_01_000007
2018-05-03 20:17:40,318 
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #5] INFO  
impl.NMClientAsyncImpl - Processing Event EventType: START_CONTAINER for 
Container container_1525297086734_0013_01_000007
2018-05-03 20:17:40,338 [Component  dispatcher] ERROR component.Component - 
[COMPONENT ping]: Invalid event CONTAINER_STARTED at STABLE
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
CONTAINER_STARTED at STABLE
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:745)
{code}
Status response shows that only 4 containers are running and the service is not 
in STABLE state -
{code:java}
yarn app -status simple-aa-1
{code}
output -
{code:java}
{
    "components": [
        {
            "configuration": {
                "env": {},
                "files": [],
                "properties": {}
            },
            "containers": [
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000007.example.site",
                    "component_instance_name": "ping-1",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000007.example.site",
                    "id": "container_1525297086734_0013_01_000003",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378141535,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000006.example.site",
                    "component_instance_name": "ping-0",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000006.example.site",
                    "id": "container_1525297086734_0013_01_000002",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378141513,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000005.example.site",
                    "component_instance_name": "ping-3",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000005.example.site",
                    "id": "container_1525297086734_0013_01_000005",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378303429,
                    "state": "READY"
                },
                {
                    "bare_host": 
"ctr-e138-1518143905142-280820-01-000004.example.site",
                    "component_instance_name": "ping-2",
                    "hostname": 
"ctr-e138-1518143905142-280820-01-000004.example.site",
                    "id": "container_1525297086734_0013_01_000004",
                    "ip": "x.x.x.x",
                    "launch_time": 1525378303425,
                    "state": "READY"
                }
            ],
            "dependencies": [],
            "launch_command": "sleep 9000",
            "name": "ping",
            "number_of_containers": 5,
            "placement_policy": {
                "constraints": [
                    {
                        "node_attributes": {},
                        "node_partitions": [],
                        "scope": "NODE",
                        "target_tags": [
                            "ping"
                        ],
                        "type": "ANTI_AFFINITY"
                    }
                ]
            },
            "quicklinks": [],
            "resource": {
                "additional": {},
                "cpus": 1,
                "memory": "256"
            },
            "run_privileged_container": false,
            "state": "FLEXING"
        }
    ],
    "configuration": {
        "env": {},
        "files": [],
        "properties": {}
    },
    "id": "application_1525297086734_0013",
    "kerberos_principal": {},
    "lifetime": -1,
    "name": "simple-aa-1",
    "quicklinks": {},
    "state": "STARTED",
    "version": "1"
}
{code}


> Flex down should first remove pending container requests (if any) and then 
> kill running containers
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8243
>                 URL: https://issues.apache.org/jira/browse/YARN-8243
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn-native-services
>    Affects Versions: 3.1.0
>            Reporter: Gour Saha
>            Priority: Major
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
>     {
>       "name": "ping",
>       "number_of_containers": 2,
>       "resource": {
>         "cpus": 1,
>         "memory": "256"
>       },
>       "launch_command": "sleep 9000",
>       "placement_policy": {
>         "constraints": [
>           {
>             "type": "ANTI_AFFINITY",
>             "scope": "NODE",
>             "target_tags": [
>               "ping"
>             ]
>           }
>         ]
>       }
>     }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_000006]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_000006]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-000006
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>       at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>       at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>       at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>       at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,578 [pool-5-thread-8] INFO  instance.ComponentInstance - 
> [COMPINSTANCE ping-4 : container_1525297086734_0013_01_000006]: Deleted 
> component instance dir: 
> hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-4
> 2018-05-03 20:17:39,268 [AMRM Callback Handler Thread] WARN  
> service.ServiceScheduler - Container container_1525297086734_0013_01_000006 
> Completed. No component instance exists. exitStatus=-100. 
> diagnostics=Container released by application 
> 2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
> service.ServiceScheduler - 1 containers allocated. 
> 2018-05-03 20:17:40,273 [AMRM Callback Handler Thread] INFO  
> service.ServiceScheduler - [COMPONENT ping]: remove 0 outstanding container 
> requests for allocateId 0
> 2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping]: container_1525297086734_0013_01_000007 allocated, num 
> pending component instances reduced to 0
> 2018-05-03 20:17:40,274 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping]: Assigned container_1525297086734_0013_01_000007 to 
> component instance ping-5 and launch on host 
> ctr-e138-1518143905142-280820-01-000008.example.site:25454 
> 2018-05-03 20:17:40,277 [pool-6-thread-6] INFO  provider.ProviderUtils - 
> [COMPINSTANCE ping-5 : container_1525297086734_0013_01_000007]: Creating dir 
> on hdfs: 
> hdfs://ctr-e138-1518143905142-280820-01-000003.example.site:8020/user/root/.yarn/services/simple-aa-1/components/1/ping/ping-5
> 2018-05-03 20:17:40,316 [pool-6-thread-6] INFO  
> containerlaunch.ContainerLaunchService - launching container 
> container_1525297086734_0013_01_000007
> 2018-05-03 20:17:40,318 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #5] INFO  
> impl.NMClientAsyncImpl - Processing Event EventType: START_CONTAINER for 
> Container container_1525297086734_0013_01_000007
> 2018-05-03 20:17:40,338 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CONTAINER_STARTED at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_STARTED at STABLE
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>       at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Status response shows that only 4 containers are running and the service is 
> not in STABLE state -
> {code:java}
> yarn app -status simple-aa-1
> {code}
> output -
> {code:java}
> {
>     "components": [
>         {
>             "configuration": {
>                 "env": {},
>                 "files": [],
>                 "properties": {}
>             },
>             "containers": [
>                 {
>                     "bare_host": 
> "ctr-e138-1518143905142-280820-01-000007.example.site",
>                     "component_instance_name": "ping-1",
>                     "hostname": 
> "ctr-e138-1518143905142-280820-01-000007.example.site",
>                     "id": "container_1525297086734_0013_01_000003",
>                     "ip": "x.x.x.x",
>                     "launch_time": 1525378141535,
>                     "state": "READY"
>                 },
>                 {
>                     "bare_host": 
> "ctr-e138-1518143905142-280820-01-000006.example.site",
>                     "component_instance_name": "ping-0",
>                     "hostname": 
> "ctr-e138-1518143905142-280820-01-000006.example.site",
>                     "id": "container_1525297086734_0013_01_000002",
>                     "ip": "x.x.x.x",
>                     "launch_time": 1525378141513,
>                     "state": "READY"
>                 },
>                 {
>                     "bare_host": 
> "ctr-e138-1518143905142-280820-01-000005.example.site",
>                     "component_instance_name": "ping-3",
>                     "hostname": 
> "ctr-e138-1518143905142-280820-01-000005.example.site",
>                     "id": "container_1525297086734_0013_01_000005",
>                     "ip": "x.x.x.x",
>                     "launch_time": 1525378303429,
>                     "state": "READY"
>                 },
>                 {
>                     "bare_host": 
> "ctr-e138-1518143905142-280820-01-000004.example.site",
>                     "component_instance_name": "ping-2",
>                     "hostname": 
> "ctr-e138-1518143905142-280820-01-000004.example.site",
>                     "id": "container_1525297086734_0013_01_000004",
>                     "ip": "x.x.x.x",
>                     "launch_time": 1525378303425,
>                     "state": "READY"
>                 }
>             ],
>             "dependencies": [],
>             "launch_command": "sleep 9000",
>             "name": "ping",
>             "number_of_containers": 5,
>             "placement_policy": {
>                 "constraints": [
>                     {
>                         "node_attributes": {},
>                         "node_partitions": [],
>                         "scope": "NODE",
>                         "target_tags": [
>                             "ping"
>                         ],
>                         "type": "ANTI_AFFINITY"
>                     }
>                 ]
>             },
>             "quicklinks": [],
>             "resource": {
>                 "additional": {},
>                 "cpus": 1,
>                 "memory": "256"
>             },
>             "run_privileged_container": false,
>             "state": "FLEXING"
>         }
>     ],
>     "configuration": {
>         "env": {},
>         "files": [],
>         "properties": {}
>     },
>     "id": "application_1525297086734_0013",
>     "kerberos_principal": {},
>     "lifetime": -1,
>     "name": "simple-aa-1",
>     "quicklinks": {},
>     "state": "STARTED",
>     "version": "1"
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to