Re: [PR] [SPARK-46258][CORE] Add `RocksDBPersistenceEngine` [spark]

via GitHub Mon, 04 Dec 2023 23:38:13 -0800


dongjoon-hyun commented on code in PR #44173:
URL: https://github.com/apache/spark/pull/44173#discussion_r1414983766



##########
core/src/main/scala/org/apache/spark/deploy/master/RecoveryModeFactory.scala:
##########
@@ -67,6 +67,25 @@ private[master] class FileSystemRecoveryModeFactory(conf: 
SparkConf, serializer:
   }
 }
 
+/**
+ * LeaderAgent in this case is a no-op. Since leader is forever leader as the 
actual
+ * recovery is made by restoring from RocksDB.
+ */
+private[master] class RocksDBRecoveryModeFactory(conf: SparkConf, serializer: 
Serializer)
+  extends StandaloneRecoveryModeFactory(conf, serializer) with Logging {
+
+  val recoveryDir = conf.get(RECOVERY_DIRECTORY)

Review Comment:
   Thank you for review. @yaooqinn and @LuciferYang 
   
   I tried according to your advices but it seems to make the configuration 
namespace a little weird because this existing configuration is a general one, 
`spark.deploy.recoveryDirectory`.
   
   
https://github.com/apache/spark/blob/1d80d80a56c418f841e282ad753fad6671c3baae/core/src/main/scala/org/apache/spark/internal/config/Deploy.scala#L54
   
   If we introduce a new configuration for `RocksDBPersistenceEngine` like the 
following, the users will ask why we cannot use 
`spark.deploy.recoveryDirectory` for `ROCKSDB` mode. Here is the example. The 
AS-IS design aims to allow the users to switch a single configuration 
`recoveryMode`.
   
   **BEFORE**
   ```
   spark.deploy.recoveryMode=FILESYSTEM
   spark.deploy.recoveryDirectory=/opt/master
   ```
   
   **AFTER**
   ```
   spark.deploy.recoveryMode=ROCKSDB
   spark.deploy.recoveryDirectory=/opt/master
   ```
   
   If we add a new configuration, it would be the following where 
`spark.deploy.recoveryDirectory` is no-op.
   ```
   spark.deploy.recoveryMode=ROCKSDB
   spark.deploy.recoveryDirectory=/opt/master
   spark.deploy.recoveryRocksDBDirectory=/opt/db
   ```
   
   In addition, we assume that the users clear the location properly when they 
changes the settings like the following. 
   - `spark.deploy.recoverySerializer`
   - `spark.deploy.recoveryCompressionCodec`
   
   To sum up, if you guys don't mind, I'd like to use the existing general 
configuration name for this backend.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46258][CORE] Add `RocksDBPersistenceEngine` [spark]

Reply via email to