amaliujia commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991855071
##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with
SparkConnectPlanTest {
}
}
+ test("Test union, except, intersect") {
+ for (isAll <- Seq(true, false)) {
+ val connectPlan = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.except(connectTestRelation, isAll))
+ }
+ val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+ comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+ val connectPlan2 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.intersect(connectTestRelation, isAll))
+ }
+ val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+ comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+ }
+
+ val connectPlan3 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.union(connectTestRelation))
Review Comment:
Another reason actually to follow Catalyst: our clients, by best effort,
should follow existing data frame API to achieve "changing imports and rest of
code work".
So it's expected that users will use union().distinct() to construct the
case above. If we ask clients to smartly identify these two API can be coalesce
together, that will be too much for clients. Instead clients just convert each
API to a plan node and connect proto follow Catalyst then it will work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]