HeartSaVioR commented on a change in pull request #24890: 
[SPARK-28074][DOC][SS] Document caveats on using multiple stateful operations 
in single query
URL: https://github.com/apache/spark/pull/24890#discussion_r295061415
 
 

 ##########
 File path: docs/structured-streaming-programming-guide.md
 ##########
 @@ -3146,6 +3146,17 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
       - After `coalesce`, the number of (reduced) tasks will be kept unless 
another shuffle happens.
   - `spark.sql.streaming.stateStore.providerClass`: To read the previous state 
of the query properly, the class of state store provider should be unchanged.
   - `spark.sql.streaming.multipleWatermarkPolicy`: Modification of this would 
lead inconsistent watermark value when query contains multiple watermarks, 
hence the policy should be unchanged.
+- Structured Streaming uses `global watermark` which might impact on query 
having multiple stateful operations.
 
 Review comment:
   Will fix on above two lines of review comment.
   
   > How would it impact the query?
   
   The answer is in below line - `Fail to answer above questions might lead to 
incorrect outputs - e.g. intermediate outputs being discarded.` Not sure the 
flow looks natural, or you might want to revise the format/flow of content.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to