srowen commented on a change in pull request #22238: [SPARK-25245][DOCS][SS] 
Explain regarding limiting modification on "spark.sql.shuffle.partitions" for 
structured streaming
URL: https://github.com/apache/spark/pull/22238#discussion_r241964639
 
 

 ##########
 File path: docs/structured-streaming-programming-guide.md
 ##########
 @@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Notes**
+
+- There're couple of configurations which are not modifiable once you run the 
query. If you really want to make changes for these configurations, you have to 
discard checkpoint and start a new query.
+  - `spark.sql.shuffle.partitions`
+    - This is due to the physical partitioning of state: state is partitioned 
via applying hash function to key, hence the number of partitions for state 
should be unchanged.
+    - If you want to run less tasks for stateful operations, `coalesce` would 
help with avoiding unnecessary repartitioning.
 
 Review comment:
   less -> fewer

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to