[GitHub] [spark] HeartSaVioR commented on a change in pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

GitBox Thu, 04 Mar 2021 18:00:24 -0800


HeartSaVioR commented on a change in pull request #31700:
URL: https://github.com/apache/spark/pull/31700#discussion_r587970348




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##########
@@ -166,6 +166,52 @@ object OverwritePartitionsDynamic {
   }
 }
 
+case class AppendMicroBatch(

Review comment:
       Streaming write semantic is not same as batch one. The semantic is bound 
to the stateful operation; there should be only append, update, and truncate 
and append (complete), and for update we haven't constructed the proper way to 
define it.
   
   The major concern is that the group keys in stateful operation must be used 
as keys in update mode. That is currently not possible, but Spark has been 
dealing with update with the huge risk that we're doing the same as append, and 
the risk is delegated to the sink (or user). The sink or user has to deal with 
reflecting the appended output as "upsert". That's why I renamed 
`SupportsStreamingUpdate` as `SupportsStreamingUpdateAsAppend` to clarify the 
behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

Reply via email to