chaoqin-li1123 commented on code in PR #41099:
URL: https://github.com/apache/spark/pull/41099#discussion_r1189175657
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -334,25 +373,59 @@ class RocksDB(
loadedVersion = -1 // invalidate loaded version
throw t
} finally {
- if (rocksDBBackgroundThreadPaused) db.continueBackgroundWork()
- silentDeleteRecursively(checkpointDir, s"committing $newVersion")
// reset resources as either 1) we already pushed the changes and it has
been committed or
// 2) commit has failed and the current version is "invalidated".
release()
}
}
+ private def shouldCreateSnapshot(): Boolean = {
+ if (enableChangelogCheckpointing) {
+ assert(changelogWriter.isDefined)
+ val newVersion = loadedVersion + 1
+ newVersion - fileManager.getLastUploadedSnapshotVersion() >=
conf.minDeltasForSnapshot ||
+ changelogWriter.get.size > 1000
Review Comment:
This does not trigger a snapshot upload, it simply flush and create a local
checkpoint. I can increase it to 10K, which guarantee that the changelog replay
between every 2 snapshot is < 50k records.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]