[ 
https://issues.apache.org/jira/browse/YARN-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7052:
-----------------------------
    Target Version/s: 2.8.2
            Priority: Critical  (was: Major)

bq. I suggest that another solution would be to handle other throwables, log 
them, and either re-throw or cancel the thread.
After an off-line discussion with [~jlowe], I think it would be better to catch 
throwables, log them, and skip the invocation. Preemption does not have 
persistent structures across invocations, plus it doesn't modify any existing 
leaf queue structures.

Since preemption can be an important productivity feature for certain use 
cases, I am marking this critical for 2.8.2.

> RM SchedulingMonitor gives no indication why the spawned thread crashed.
> ------------------------------------------------------------------------
>
>                 Key: YARN-7052
>                 URL: https://issues.apache.org/jira/browse/YARN-7052
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>            Priority: Critical
>
> In YARN-7051, we ran into a case where the preemption monitor thread hung 
> with no indication of why.
> The preemption monitor is started by the {{SchedulingExecutorService}} from 
> {{SchedulingMonitor#serviceStart}}. Once an uncaught throwable happens, 
> nothing ever gets the result of the future, the thread running the preemption 
> monitor never dies, and it never gets rescheduled.
> If {{HadoopExecutor}} were used, it would at least provide a 
> {{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to