[
https://issues.apache.org/jira/browse/YARN-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Payne updated YARN-7052:
-----------------------------
Description:
In YARN-7051, we ran into a case where the preemption monitor thread hung with
no indication of why.
The preemption monitor is started by the {{SchedulingExecutorService}} from
{{SchedulingMonitor#serviceStart}}. Once an uncaught throwable happens, nothing
ever gets the result of the future, the thread running the preemption monitor
never dies, and it never gets rescheduled.
If {{HadoopExecutor}} were used, it would at least provide a
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.
was:
In YARN-7051, we ran into a case where the preemption monitor thread hung with
no indication of why. This was because the preemption monitor is started by the
{{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}, and then
nothing ever gets the result of the future or allows it to throw an exception
if needed.
At least with {{HadoopExecutor}}, it will provide a
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.
Summary: RM SchedulingMonitor gives no indication why the spawned
thread crashed. (was: RM SchedulingMonitor should use HadoopExecutors when
creating ScheduledExecutorService)
Using {{HadoopExecutor}} may not be feasible.
The following is used to launch the preemption thread:
{code:title=SchedulingMonitor#serviceStart}
ses = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
...
handler = ses.scheduleAtFixedRate(new PreemptionChecker(),
0, monitorInterval, TimeUnit.MILLISECONDS);
{code}
{{HadoopExecutors}} provides a {{newSingleThreadScheduledExecutor}} interface,
but it just turns around and calls
{{Executors#newSingleThreadScheduledExecutor}}. The
{{HadoopExecutors#newSingleThreadScheduledExecutor}} method does not provide
the {{HadoopScheduledThreadPoolExecutor}} wrapper in the return value of that
interface, so you don't get the logging benefits if you use
{{HadoopExecutors#newSingleThreadScheduledExecutor}}
Alternatively, we could have the thread itself catch and handle throwables.
The thread being launched by {{SchedulingMonitor#serviceStart}} is calling
{{PreemptionChecker#run}}, which only handles {{YarnRuntimeException}}.
Anything else will cause the thread to hang and not get rescheduled.
I suggest that another solution would be to handle other throwables, log them,
and either re-throw or cancel the thread.
> RM SchedulingMonitor gives no indication why the spawned thread crashed.
> ------------------------------------------------------------------------
>
> Key: YARN-7052
> URL: https://issues.apache.org/jira/browse/YARN-7052
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Eric Payne
> Assignee: Eric Payne
>
> In YARN-7051, we ran into a case where the preemption monitor thread hung
> with no indication of why.
> The preemption monitor is started by the {{SchedulingExecutorService}} from
> {{SchedulingMonitor#serviceStart}}. Once an uncaught throwable happens,
> nothing ever gets the result of the future, the thread running the preemption
> monitor never dies, and it never gets rescheduled.
> If {{HadoopExecutor}} were used, it would at least provide a
> {{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]