[GitHub] [spark] HeartSaVioR commented on a change in pull request #30789: [SPARK-33797][SS][DOCS] Update SS doc about State Store and task locality

GitBox Wed, 16 Dec 2020 15:30:13 -0800


HeartSaVioR commented on a change in pull request #30789:
URL: https://github.com/apache/spark/pull/30789#discussion_r544698190




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1689,6 +1689,25 @@ hence the number is not same as the number of original 
input rows. You'd like to
 There's a known workaround: split your streaming query into multiple queries 
per stateful operator, and ensure
 end-to-end exactly once per query. Ensuring end-to-end exactly once for the 
last query is optional.
 
+### State Store and task locality
+
+The stateful operations store states for events in state stores of executors. 
State stores occupy resources such as memory and disk space to store the states.
+So it is more efficient to keep a state store provider running in the same 
executor across different streaming batches.

Review comment:
       `batches` would also just work - looks like in further sentences you 
seem to use `batches` simply.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30789: [SPARK-33797][SS][DOCS] Update SS doc about State Store and task locality

Reply via email to