[GitHub] [spark] HeartSaVioR commented on a change in pull request #29461: [SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset

GitBox Fri, 04 Sep 2020 00:12:18 -0700


HeartSaVioR commented on a change in pull request #29461:
URL: https://github.com/apache/spark/pull/29461#discussion_r483430511




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -861,6 +861,10 @@ isStreaming(df)
 </div>
 </div>
 
+You may want to check the logical plan of the query, as Spark converts the 
operation into another operation, which includes adding streaming aggregation. 
(e.g. count, distinct, union, etc.)

Review comment:
       Probably we can reword here as well to simplify, like 
   
   > You may want to check the query plan of the query, as Spark could inject 
stateful operations during interpret of SQL statement against streaming 
dataset. Once stateful operations are injected in the query plan, you may need 
to check your query with considerations in stateful operations. (e.g. output 
mode, watermark, state store size maintenance, etc.)
   
   If the reworded sentences sound better then I can update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #29461: [SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset

Reply via email to