micheal-o commented on code in PR #47875:
URL: https://github.com/apache/spark/pull/47875#discussion_r1772311074


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:
##########
@@ -150,29 +149,7 @@ class RocksDBFileManager(
   private var minSeenVersion = 1L
 
   @volatile private var rootDirChecked: Boolean = false
-  @volatile private var fileMappings = RocksDBFileMappings(
-    new ConcurrentHashMap[Long, Seq[RocksDBImmutableFile]],
-    new ConcurrentHashMap[String, RocksDBImmutableFile]
-  )
-
-  /**
-   * Make a deep copy of versionToRocksDBFiles and localFilesToDfsFiles to 
avoid
-   * current task thread from overwriting the file mapping whenever background 
maintenance
-   * thread attempts to upload a snapshot
-   */
-  def copyFileMapping() : Unit = {
-    val newVersionToRocksDBFiles = new ConcurrentHashMap[Long, 
Seq[RocksDBImmutableFile]]
-    val newLocalFilesToDfsFiles = new ConcurrentHashMap[String, 
RocksDBImmutableFile]
-
-    newVersionToRocksDBFiles.putAll(fileMappings.versionToRocksDBFiles)
-    newLocalFilesToDfsFiles.putAll(fileMappings.localFilesToDfsFiles)
-
-    fileMappings = RocksDBFileMappings(newVersionToRocksDBFiles, 
newLocalFilesToDfsFiles)
-  }
-
-  def captureFileMapReference(): RocksDBFileMappings = {
-    fileMappings
-  }
+  private val versionToRocksDBFiles = new ConcurrentHashMap[Long, 
Seq[RocksDBImmutableFile]]

Review Comment:
   In the class doc for RocksDBFileManager, it says the class is thread-safe. 
So seems the intention is to make it usable by multiple threads. I think the 
min/maxSeenVersion were recently added, so i have added comments to them to 
explain how they are used.
   
   Yes, there is still room to refactor a lot of these RocksDB related code, 
but will leave this as it is for now, to reduce the amount of refactoring in 
this PR. I will evaluate this as part of my subsequent hardening work and 
weight the benefits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to