[GitHub] [spark] HeartSaVioR commented on a change in pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

GitBox Thu, 04 Mar 2021 18:03:27 -0800


HeartSaVioR commented on a change in pull request #31700:
URL: https://github.com/apache/spark/pull/31700#discussion_r587970348




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##########
@@ -166,6 +166,52 @@ object OverwritePartitionsDynamic {
   }
 }
 
+case class AppendMicroBatch(

Review comment:
       Streaming write semantic is not same as batch one. The semantic is bound 
to the stateful operation; there should be only `append`, `update` (not same as 
overwrite), and `truncate and append (complete)`. For update we haven't 
constructed the proper way to define it.
   
   The major concern is that the group keys in stateful operation must be used 
as keys in update mode. That is currently not possible (there are some sketched 
ideas on this though), but Spark has been dealing with update with the huge 
risk that we're doing the same as append, and the risk is delegated to the sink 
(or user). The sink or user has to deal with reflecting the appended output as 
"upsert". That's why I renamed `SupportsStreamingUpdate` as 
`SupportsStreamingUpdateAsAppend` to clarify the behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

Reply via email to