HeartSaVioR commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-628970434


   Thanks for the great input, @xccui.
   
   Basically I agree with your input - that's the same as my understanding as I 
commented before 
(https://github.com/apache/spark/pull/28523#discussion_r424845811).
   
   To summarize my previous comment, I also don't know how the streaming output 
mode was designed, but from my understanding it's effective only on result 
table for stateful aggregation operators. It's not even applied for all 
stateful operators, e.g. the mode doesn't affect stream-stream join. It doesn't 
guarantee the final output is respecting the semantic, and then there's no 
meaning of applying the same on the sink side.
   
   Another concern comes into my mind is complete mode. The complete mode is 
also effective on the result table. It may sound making sense to support 
complete mode in sink as truncate and insert, but it leads to data loss for the 
case the result table is being union to other stream which is not creating 
"result table". (I haven't had such query but it's technically possible.) The 
complete mode will not care about the other stream and in every batch the 
previous output from the other stream will be lost. I think complete mode is 
weird one for streaming and better to discontinue supporting; I wouldn't expect 
any production query to use this mode, but please let me know if there is.
   
   Anyway I think the streaming update mode technically doesn't couple with the 
availability of sink. It should be left as it is, though we'll probably have to 
fix guide doc as the guide doc says it's for result table "as well as" for the 
sink. Description of the streaming output mode in sink should be corrected as 
well - they're not dependent on streaming output mode, and as of now only 
append is possible.
   
   ps. We may need to revisit the operators and streaming output modes to see 
any flaw, similarly I went through via [discussion 
thread](https://lists.apache.org/thread.html/cc6489a19316e7382661d305fabd8c21915e5faf6a928b4869ac2b4a@%3Cdev.spark.apache.org%3E)
 and #24890. One thing would be flatMapGroupsWithState with append mode.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to