HeartSaVioR commented on a change in pull request #24890: [SPARK-28074][SS] Log warn message on possible correctness issue for multiple stateful operations in single query URL: https://github.com/apache/spark/pull/24890#discussion_r329249252
########## File path: docs/structured-streaming-programming-guide.md ########## @@ -1647,6 +1647,27 @@ For example, sorting on the input stream is not supported, as it requires keepin track of all the data received in the stream. This is therefore fundamentally hard to execute efficiently. +### Limitation of global watermark + +In some circumstance, some stateful operations could emit rows older than current watermark (with allowed delay), +which are "late rows" in downstream stateful operations (as Spark uses global watermark) and these rows can be discarded. +This could bring correctness issue. + +This is a limitation of global watermark and operator-wise watermark is not yet supported. Before Spark will support Review comment: Yes it describes potential future change. I left the content because operator-wise watermark is what Spark is left behind so it might be better to provide some promise, but it's not even ongoing effort (may require a new SPIP) so omitting it would be better to not provide wrong signal. I'll omit it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
