[ 
https://issues.apache.org/jira/browse/YARN-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602834#comment-17602834
 ] 

ASF GitHub Bot commented on YARN-11251:
---------------------------------------

Samrat002 opened a new pull request, #4877:
URL: https://github.com/apache/hadoop/pull/4877

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   [YARN-11251](https://issues.apache.org/jira/browse/YARN-11251)
   
   ### How was this patch tested?
   Tested in EMR cluster 
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Separate ThreadPool for AMLauncher Launch and Clean Events
> ----------------------------------------------------------
>
>                 Key: YARN-11251
>                 URL: https://issues.apache.org/jira/browse/YARN-11251
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.0
>            Reporter: Prabhu Joseph
>            Assignee: Samrat Deb
>            Priority: Major
>
> Have seen too many AM Launch Failures due to Token Expired or Container 
> Liveliness Expiry when AM Launch Threads are busy retrying to connect to AM 
> Host (Spot Instances) which are down. Having Separate ThreadPools for both 
> Cleanup and Launch will reduce the AM Launch failures.
> *Token Expired*
> {code}
> 2022-07-19 14:56:33,486 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  (IPC Server handler 39 on 8041): Unauthorized request to start container.
> This token is expired. current time is 1658242593486 found 1658242289457
> Note: System times on machines may be out of sync. Check system time and time 
> zones.
> {code}
> *Container Liveliness Expiry*
> {code}
> 2022-07-19 16:06:48,663 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (ResourceManager Event Processor): container_1656573205571_2357731_01_000001 
> Container Transitioned from ACQUIRED to EXPIRED
> 2022-07-19 16:10:08,663 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor (Ping Checker): 
> Expired:<container=container_1656573205571_2357773_01_000001, increase=false> 
> Timed out after 600 secs
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to