mridulm commented on code in PR #37624:
URL: https://github.com/apache/spark/pull/37624#discussion_r954250369


##########
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##########
@@ -795,13 +796,34 @@ public void registerExecutor(String appId, 
ExecutorShuffleInfo executorInfo) {
   }
 
   /**
-   * Close the DB during shutdown
+   * Shutdown mergedShuffleCleaner and close the DB during shutdown
    */
   @Override
   public void close() {
+    if (!mergedShuffleCleaner.isShutdown()) {
+      // Use two phases shutdown refer to
+      // 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html
+      try {
+        mergedShuffleCleaner.shutdown();
+        // Wait a while for existing tasks to terminate
+        if (!mergedShuffleCleaner.awaitTermination(10L, TimeUnit.SECONDS)) {

Review Comment:
   Let us do that as a follow up then, based on what behavior we see in 
production.
   Can we log the number of `Runnable`'s which did not complete gracefully in 
that case (returned in `shutdownNow`) ? It will be a proxy for whether this 
needs to be tuned.
   
   Note that a higher value does not necessarily mean we wait that long - just 
that we wait upto that in case some of the tasks are taking longer - given 
that, let us bump it up to 60s.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to