imback82 commented on pull request #28435:
URL: https://github.com/apache/spark/pull/28435#issuecomment-623570107
The shutdown hook is eventually called from
`org.apache.hadoop.util.ShutdownHookManager`:
```JAVA
for (HookEntry entry: getShutdownHooksInOrder()) {
Future<?> future = EXECUTOR.submit(entry.getHook());
try {
future.get(entry.getTimeout(), entry.getTimeUnit());
} catch (TimeoutException ex) {
```
where `EXECUTOR` is
```JAVA
private static final ExecutorService EXECUTOR =
HadoopExecutors.newSingleThreadExecutor(new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat("shutdown-hook-%01d")
.build());
```
What I noticed when there is a timeout, the hook is submitted to be
executed, but didn't run at all until the timeout is reached. By contrast, in
Hadoop <=2.7, the hook was run in the same thread that was running above loop,
but without the timeout functionality.
So I don't think we should rely on the shutdown hook to unregister for the
successful run.
This applies to both client and cluster mode (since the hook is registered
in `ApplicationMaster.run`) although my main observation is from the cluster
runs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]