Re: [PR] [SPARK-48796][SS] Load Column Family Id from RocksDBCheckpointMetadata for VCF when restarting [spark]

via GitHub Mon, 19 Aug 2024 06:51:39 -0700


HeartSaVioR commented on code in PR #47778:
URL: https://github.com/apache/spark/pull/47778#discussion_r1721437164



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -550,6 +641,12 @@ class RocksDB(
           } finally {
             changelogWriter = None
           }
+          // If we have changed the columnFamilyId mapping, we have set a new

Review Comment:
   Shall we cancel the changelog writer when we upload snapshot instead of 
writing both? It's still not harmful if we succeed to commit changelog but fail 
to upload snapshot assuming that the batch will be marked as failed, but it's 
still redundant to commit changelog.
   
   Also, the changelog and the snapshot is not exactly the same (not just a 
pure replacement), so if there are both changelog and snapshot for the same 
version, it is giving more confusion, and in worst case, you lost snapshot file 
by any reason and the query will take changelog and lose the information of 
column family update, instead of failing query (proper expectation).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48796][SS] Load Column Family Id from RocksDBCheckpointMetadata for VCF when restarting [spark]

Reply via email to