Yesha Vora created YARN-8579:
--------------------------------

             Summary: New AM attempt could not retrieve previous attempt 
component data
                 Key: YARN-8579
                 URL: https://issues.apache.org/jira/browse/YARN-8579
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.1.1
            Reporter: Yesha Vora


Steps:
1) Launch httpd-docker
2) Wait for app to be in STABLE state
3) Run validation for app (It takes around 3 mins)
4) Stop all Zks 
5) Wait 60 sec
6) Kill AM
7) wait for 30 sec
8) Start all ZKs
9) Wait for application to finish
10) Validate expected containers of the app

Expected behavior:
New attempt of AM should start and docker containers launched by 1st attempt 
should be recovered by new attempt.

Actual behavior:
New AM attempt starts. It can not recover 1st attempt docker containers. It can 
not read component details from ZK. 
Thus, it starts new attempt for all containers.

{code}
2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
appattempt_1531977563978_0015_000002, fault-test-zkrm-httpd-docker into registry
2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
containers from previous attempt.
2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not read 
component paths: 
`/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': 
No such file or directory: KeeperErrorCode = NoNode for 
/registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
container_e08_1531977563978_0015_01_000003 from previous attempt
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
found in registry for container container_e08_1531977563978_0015_01_000003 from 
previous attempt, releasing
2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
initial evaluation of component httpd
2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
httpd]: 2 instances.
2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
Requesting for 2 container(s){code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to