HeartSaVioR commented on code in PR #50512:
URL: https://github.com/apache/spark/pull/50512#discussion_r2033148344


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:
##########
@@ -320,6 +322,10 @@ class RocksDBFileManager(
     }
     logFilesInDir(localDir, log"Loaded checkpoint files " +
       log"for version ${MDC(LogKeys.VERSION_NUM, version)}")
+    logInfo(log"RocksDB file mapping after loading checkpoint version " +

Review Comment:
   ditto about the length of log message



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -1378,19 +1399,26 @@ class RocksDBFileMapping {
    */
   private def getDfsFileForSave(
       fileManager: RocksDBFileManager,
-      localFileName: String,
+      localFile: File,
       versionToSave: Long): Option[RocksDBImmutableFile] = {
-    getDfsFileWithVersionCheck(fileManager, localFileName, _ >= versionToSave)
+    getDfsFileWithIncompatibilityCheck(
+      fileManager,
+      localFile.getName,
+      (dfsFileVersion, dfsFile) =>
+        // The DFS file is not the same as the file we want to save, either if
+        // the DFS file was added in the same or higher version, or the file 
size is different
+        dfsFileVersion >= versionToSave || dfsFile.sizeBytes != 
localFile.length()

Review Comment:
   Just to make clear: would `dfsFile.sizeBytes` make another call to remote 
FS, or will it contain the info already?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -885,6 +885,10 @@ class RocksDB(
 
       val (dfsFileSuffix, immutableFileMapping) = 
rocksDBFileMapping.createSnapshotFileMapping(
         fileManager, checkpointDir, version)
+      logInfo(log"RocksDB file mapping after creating snapshot file mapping 
for version " +

Review Comment:
   @micheal-o 
   How much size of this message in general? We had a PR which reduced the log 
for RocksDB state store provider, and I wouldn't like to re-introduce the 
issue. If this is considerably huge, we'd need to make this be debug level.
   cc. @anishshri-db 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to