[ 
https://issues.apache.org/jira/browse/YARN-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903283#comment-17903283
 ] 

ASF GitHub Bot commented on YARN-10058:
---------------------------------------

TaoYang526 commented on PR #7129:
URL: https://github.com/apache/hadoop/pull/7129#issuecomment-2519801904

   > @TaoYang526 - What could be the real life cases when the Async thread 
could crash ? Just curious to know
   
   @shameersss1  Async-thread may crash when runtime exceptions like NPE are 
thrown inside the scheduling process, this PR can guarantee that RM will 
recovered in those scenarios, otherwise RM will hang without obvious error (so 
that user have to dig ERROR info from large volume of RM logs). There's no 
known cases for me, but there may still be unresolved issues or new problems 
introduced in the future.




> Capacity Scheduler dispatcher hang when async thread crash
> ----------------------------------------------------------
>
>                 Key: YARN-10058
>                 URL: https://issues.apache.org/jira/browse/YARN-10058
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.2.0, 3.2.1
>            Reporter: tuyu
>            Assignee: Tao Yang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 0001-global-scheduling-standby-hang.patch
>
>
> when capacity scheduler enable global scheduler, if global scheduler's 
> AsyncScheduleThread crash, the capacity scheduler dispatcher will hang for 
> long time. This behavior is unreasonable. 
> if this situation happen, In HA mode, current RM should change to standby



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to