HeartSaVioR commented on a change in pull request #29461:
URL: https://github.com/apache/spark/pull/29461#discussion_r483430511
##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -861,6 +861,10 @@ isStreaming(df)
</div>
</div>
+You may want to check the logical plan of the query, as Spark converts the
operation into another operation, which includes adding streaming aggregation.
(e.g. count, distinct, union, etc.)
Review comment:
Probably we can reword here as well to simplify, like
> You may want to check the query plan of the query, as Spark could inject
stateful operations during interpret of SQL statement against streaming
dataset. Once stateful operations are injected in the query plan, you may need
to check your query with considerations in stateful operations. (e.g. output
mode, watermark, state store size maintenance, etc.)
If the reworded sentences sound better then I can update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]