[GitHub] [spark] HeartSaVioR commented on a change in pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

GitBox Sun, 08 Aug 2021 23:39:53 -0700


HeartSaVioR commented on a change in pull request #33683:
URL: https://github.com/apache/spark/pull/33683#discussion_r684946042




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1814,6 +1814,23 @@ Specifically for built-in HDFS state store provider, 
users can check the state s
 it is best if cache missing count is minimized that means Spark won't waste 
too much time on loading checkpointed state.
 User can increase Spark locality waiting configurations to avoid loading state 
store providers in different executors across batches.
 
+### RocksDB state store implementation
+
+As of Spark 3.2, we add a new build-in state store implementation, RocksDB 
state store provider.
+
+The current build-in HDFS state store provider has two major drawbacks:
+
+* The amount of state that can be maintained is limited by the heap size of 
the executors
+* State expiration by watermark and/or timeouts require full scans over all 
the data
+
+The RocksDB-based State Store implementation can address these drawbacks:
+
+* RocksDB can serve data from the disk with a configurable amount of non-JVM 
memory.
+* Sorting keys using the appropriate column should avoid full scans to find 
the to-be-dropped keys.

Review comment:
       Please correct me if I'm missing; while this could be something we can 
evaluate and address, this is not true at least for now. We don't distinguish 
event time field in state store.
   
   Prefix scan is the only thing we leverage sorted key for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #33683: [SPARK-36041][SS][DOCS] Introduce the RocksDBStateStoreProvider in the programming guide

Reply via email to