HeartSaVioR commented on a change in pull request #29461:
URL: https://github.com/apache/spark/pull/29461#discussion_r483420929
##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -861,6 +861,10 @@ isStreaming(df)
</div>
</div>
+You may want to check the logical plan of the query, as Spark converts the
operation into another operation, which includes adding streaming aggregation.
(e.g. count, distinct, union, etc.)
Review comment:
The thing is whether Spark injects streaming aggregation which end users
have to maintain or not, and that can be checked by looking into logical plan,
right? I didn't mean they need to find the distinct in logical plan and how
Spark changes the operation. They just need to check for stateful operations.
SQL distinct and Dataset dropDuplicate aren't the only difference. SQL union
and Dataset union are also different. The cases can increase and decrease
according to the Spark catalyst rules, which is not we can ensure the doc be in
sync.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]