srowen commented on a change in pull request #24890: [SPARK-28074][SS] Log warn 
message on possible correctness issue for multiple stateful operations in 
single query
URL: https://github.com/apache/spark/pull/24890#discussion_r329316644
 
 

 ##########
 File path: docs/structured-streaming-programming-guide.md
 ##########
 @@ -1647,6 +1648,26 @@ For example, sorting on the input stream is not 
supported, as it requires keepin
 track of all the data received in the stream. This is therefore fundamentally 
hard to execute 
 efficiently.
 
+### Limitation of global watermark
+
+In Append mode, some stateful operations could emit rows older than current 
watermark plus allowed late record delay,
+which are "late rows" in downstream stateful operations (as Spark uses global 
watermark) and these rows can be discarded.
 
 Review comment:
   can be discarded implies that it's OK to discard them. Are you saying "may 
be discarded"?
   Then I'd say "if a stateful operation emits rows older ... note that these 
rows may be discarded"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to