siying commented on code in PR #47875:
URL: https://github.com/apache/spark/pull/47875#discussion_r1768891593


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:
##########
@@ -150,29 +149,7 @@ class RocksDBFileManager(
   private var minSeenVersion = 1L
 
   @volatile private var rootDirChecked: Boolean = false
-  @volatile private var fileMappings = RocksDBFileMappings(
-    new ConcurrentHashMap[Long, Seq[RocksDBImmutableFile]],
-    new ConcurrentHashMap[String, RocksDBImmutableFile]
-  )
-
-  /**
-   * Make a deep copy of versionToRocksDBFiles and localFilesToDfsFiles to 
avoid
-   * current task thread from overwriting the file mapping whenever background 
maintenance
-   * thread attempts to upload a snapshot
-   */
-  def copyFileMapping() : Unit = {
-    val newVersionToRocksDBFiles = new ConcurrentHashMap[Long, 
Seq[RocksDBImmutableFile]]
-    val newLocalFilesToDfsFiles = new ConcurrentHashMap[String, 
RocksDBImmutableFile]
-
-    newVersionToRocksDBFiles.putAll(fileMappings.versionToRocksDBFiles)
-    newLocalFilesToDfsFiles.putAll(fileMappings.localFilesToDfsFiles)
-
-    fileMappings = RocksDBFileMappings(newVersionToRocksDBFiles, 
newLocalFilesToDfsFiles)
-  }
-
-  def captureFileMapReference(): RocksDBFileMappings = {
-    fileMappings
-  }
+  private val versionToRocksDBFiles = new ConcurrentHashMap[Long, 
Seq[RocksDBImmutableFile]]

Review Comment:
   Thanks for the change of making uploadSnapshot static. It will make the 
maintenance thread doesn't access to most variables in RocksDB, which is great. 
I see we are still sharing the whole RocksDBFileManager between task thread and 
maintenance thread. And RocksDBFileManager contains more than basic file access 
variables such as dfsRoot, etc.  versionToRocksDBFiles can be accessed by both 
threads. What's our shared thread access model for it? Is it shared?  Also 
minSeenVersion and maxSeenVersion. At least we need to add a comment to explain 
it. Ideally the classes are organized way so that it is more self-evident. But 
I can accept that we leave it as it is with some comments for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to