amaliujia commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991828578
##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with
SparkConnectPlanTest {
}
}
+ test("Test union, except, intersect") {
+ for (isAll <- Seq(true, false)) {
+ val connectPlan = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.except(connectTestRelation, isAll))
+ }
+ val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+ comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+ val connectPlan2 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.intersect(connectTestRelation, isAll))
+ }
+ val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+ comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+ }
+
+ val connectPlan3 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.union(connectTestRelation))
Review Comment:
We are doing Union with isAll because it is in the same message with
intersect and except and they share the same fields.
However I don't agree with it though. Generally speaking if there is
something DataFrame decides to do, I think figuring out the context will be
useful. People working on Spark are smart. They can of course just does union
with isAll but they decided to not do so. There must be a valid reason and that
reason might also apply to connect.
So we can choose not to follow what Spark DataFrame does, but I guess we
should at leas understand the context and then understand our decision.
##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with
SparkConnectPlanTest {
}
}
+ test("Test union, except, intersect") {
+ for (isAll <- Seq(true, false)) {
+ val connectPlan = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.except(connectTestRelation, isAll))
+ }
+ val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+ comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+ val connectPlan2 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.intersect(connectTestRelation, isAll))
+ }
+ val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+ comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+ }
+
+ val connectPlan3 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.union(connectTestRelation))
Review Comment:
We are doing Union with isAll because it is in the same message with
intersect and except and they share the same fields.
However I don't agree with it though. Generally speaking if there is
something DataFrame decides to do, I think figuring out the context will be
useful. People working on Spark are smart. They can of course just does union
with isAll but they decided to not do so. There must be a valid reason and that
reason might also apply to connect.
So we can choose not to follow what Spark DataFrame does, but I guess we
should at least understand the context and then understand our decision.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]