HeartSaVioR commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-628970434
Thanks for the great input, @xccui. Basically I agree with your input - that's the same as my understanding as I commented before (https://github.com/apache/spark/pull/28523#discussion_r424845811). To summarize my previous comment, I also don't know how the streaming output mode was designed, but from my understanding it's effective only on result table for stateful aggregation operators. It's not even applied for all stateful operators, e.g. the mode doesn't affect stream-stream join. It doesn't guarantee the final output is respecting the semantic, and then there's no meaning of applying the same on the sink side. Another concern comes into my mind is complete mode. The complete mode is also effective on the result table. It may sound making sense to support complete mode in sink as truncate and insert, but it leads to data loss for the case the result table is being union to other stream which is not creating "result table". (I haven't had such query but it's technically possible.) The complete mode will not care about the other stream and in every batch the previous output from the other stream will be lost. I think complete mode is weird one for streaming and better to discontinue supporting; I wouldn't expect any production query to use this mode, but please let me know if there is. Anyway I think the streaming update mode technically doesn't couple with the availability of sink. It should be left as it is, though we'll probably have to fix guide doc as the guide doc says it's for result table "as well as" for the sink. Description of the streaming output mode in sink should be corrected as well - they're not dependent on streaming output mode, and as of now only append is possible. ps. We may need to revisit the operators and streaming output modes to see any flaw, similarly I went through via [discussion thread](https://lists.apache.org/thread.html/cc6489a19316e7382661d305fabd8c21915e5faf6a928b4869ac2b4a@%3Cdev.spark.apache.org%3E) and #24890. One thing would be flatMapGroupsWithState with append mode. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
