cloud-fan commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991003957


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -107,15 +107,17 @@ message Join {
   }
 }
 
-// Relation of type [[Union]], at least one input must be set.
-message Union {
+// Relation of type [[SetOperation]]
+message SetOperation {

Review Comment:
   Note: Spark has a tricky optimization for a long chain of `Union`. e.g. 
`df1.union(df2).union(df3).union...`, Spark does not create a left-deep `Union` 
tree (which makes analyzer/optimizer run very slow), but just a single `Union` 
with many children. I'm not sure if we want to do this optimization in Spark 
Connect clients or not. But if we do, we need a special `Union` proto 
definition which takes a Seq of children, not just left and right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to