LuciferYang commented on code in PR #37624:
URL: https://github.com/apache/spark/pull/37624#discussion_r956560789


##########
resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala:
##########
@@ -770,6 +771,40 @@ abstract class YarnShuffleServiceSuite extends 
SparkFunSuite with Matchers {
     s1.stop()
   }
 
+  test("SPARK-40186: shuffleMergeManager should have been shutdown before db 
closed") {
+    val maxId = 100
+    s1 = createYarnShuffleService()
+    val resolver = s1.shuffleMergeManager.asInstanceOf[RemoteBlockPushResolver]
+    (0 to maxId).foreach { id =>
+      val appId = ApplicationId.newInstance(0, id)
+      val appInfo = makeAppInfo("user", appId)
+      s1.initializeApplication(appInfo)
+      val mergedShuffleInfo =
+        new ExecutorShuffleInfo(
+          Array(new File(tempDir, "foo/foo").getAbsolutePath,
+            new File(tempDir, "bar/bar").getAbsolutePath), 3,
+          SORT_MANAGER_WITH_MERGE_SHUFFLE_META_WithAttemptID1)
+      resolver.registerExecutor(appId.toString, mergedShuffleInfo)
+    }
+
+    (0 until maxId).foreach { id =>
+      val appId = ApplicationId.newInstance(0, id)
+      resolver.applicationRemoved(appId.toString, true)
+    }
+
+    s1.stop()
+
+    assert(ShuffleTestAccessor.isMergedShuffleCleanerShutdown(resolver))
+    assert(ShuffleTestAccessor.mergeManagerLevelDB(resolver) == null)
+
+    val message = intercept[RejectedExecutionException] {

Review Comment:
   > It is better to keep this `final` - any particular reason to change this ?
   
   Due to this test scene.
   
   If not reset `db=null` after db close, then directly invoke ` resolver 
.applicationRemoved` will still cause `delete data from a closed db instance`(I 
manual tested RocksDB. If keep final and  not set  `db = null` after db closed, 
it will still crash.).
   
   
https://github.com/apache/spark/blob/4b0c3bab1ab082565a051990bf45774f15962deb/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java#L373-L394
   
   As above code, both `removeAppAttemptPathInfoFromDB` and 
`removeOldApplicationAttemptsFromDb` are synchronous operations. These 
operations will be rejected only when db is `null`.
   
   Therefore, for safety, I suggest removing the final of `db` and  leaving 
reset `db` to null.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to