Re: [PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SS] Add base support for new arbitrary state management operator, single valueState type, multiple state variables and underlying support for column families for RocksDBStateStoreProvider with/without changelog checkpointing [spark]

via GitHub Tue, 02 Jan 2024 01:35:46 -0800


anishshri-db commented on code in PR #43961:
URL: https://github.com/apache/spark/pull/43961#discussion_r1439278954



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -50,13 +51,15 @@ import org.apache.spark.util.{NextIterator, Utils}
  * @param localRootDir Root directory in local disk that is used to working 
and checkpointing dirs
  * @param hadoopConf   Hadoop configuration for talking to the remote file 
system
  * @param loggingId    Id that will be prepended in logs for isolating 
concurrent RocksDBs
+ * @param useColumnFamilies Used to determine whether a single or multiple 
column families are used
  */
 class RocksDB(
     dfsRootDir: String,
     val conf: RocksDBConf,
     localRootDir: File = Utils.createTempDir(),
     hadoopConf: Configuration = new Configuration,
-    loggingId: String = "") extends Logging {
+    loggingId: String = "",
+    useColumnFamilies: Boolean = false) extends Logging {

Review Comment:
   I thought about this actually - but the reason I added this flag is 2 fold:
   - one is to isolate users of this flag - basically in the current impl, this 
flag is set to true only for the `transformWithState` operator. We are not 
touching any other operators - so we would limit the impact surface
   - second is to identify which changelog writer format to use
   
   If we distinguish old vs new based on just the `default` column family name 
- either we won't be able use the `default` col family with the new operator or 
we won't be able to identify which writers/formats to use



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SS] Add base support for new arbitrary state management operator, single valueState type, multiple state variables and underlying support for column families for RocksDBStateStoreProvider with/without changelog checkpointing [spark]

Reply via email to