siying commented on code in PR #47875:
URL: https://github.com/apache/spark/pull/47875#discussion_r1768891593
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:
##########
@@ -150,29 +149,7 @@ class RocksDBFileManager(
private var minSeenVersion = 1L
@volatile private var rootDirChecked: Boolean = false
- @volatile private var fileMappings = RocksDBFileMappings(
- new ConcurrentHashMap[Long, Seq[RocksDBImmutableFile]],
- new ConcurrentHashMap[String, RocksDBImmutableFile]
- )
-
- /**
- * Make a deep copy of versionToRocksDBFiles and localFilesToDfsFiles to
avoid
- * current task thread from overwriting the file mapping whenever background
maintenance
- * thread attempts to upload a snapshot
- */
- def copyFileMapping() : Unit = {
- val newVersionToRocksDBFiles = new ConcurrentHashMap[Long,
Seq[RocksDBImmutableFile]]
- val newLocalFilesToDfsFiles = new ConcurrentHashMap[String,
RocksDBImmutableFile]
-
- newVersionToRocksDBFiles.putAll(fileMappings.versionToRocksDBFiles)
- newLocalFilesToDfsFiles.putAll(fileMappings.localFilesToDfsFiles)
-
- fileMappings = RocksDBFileMappings(newVersionToRocksDBFiles,
newLocalFilesToDfsFiles)
- }
-
- def captureFileMapReference(): RocksDBFileMappings = {
- fileMappings
- }
+ private val versionToRocksDBFiles = new ConcurrentHashMap[Long,
Seq[RocksDBImmutableFile]]
Review Comment:
Thanks for the change of making uploadSnapshot static. It will make the
maintenance thread doesn't access to most variables in RocksDB, which is great.
I see we are still sharing the whole RocksDBFileManager between task thread and
maintenance thread. And RocksDBFileManager contains more than basic file access
variables such as dfsRoot, etc. versionToRocksDBFiles can be accessed by both
threads. What's our shared thread access model for it? Is it shared? Also
minSeenVersion and maxSeenVersion. At least we need to add a comment to explain
it. Ideally the classes are organized way so that it is more self-evident. But
I can accept that we leave it as it is with some comments for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]