cloud-fan commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991874904
##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with
SparkConnectPlanTest {
}
}
+ test("Test union, except, intersect") {
+ for (isAll <- Seq(true, false)) {
+ val connectPlan = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.except(connectTestRelation, isAll))
+ }
+ val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+ comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+ val connectPlan2 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.intersect(connectTestRelation, isAll))
+ }
+ val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+ comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+ }
+
+ val connectPlan3 = {
+ import org.apache.spark.sql.connect.dsl.plans._
+ transform(connectTestRelation.union(connectTestRelation))
Review Comment:
Spark Connect may have many clients, built-in or third-party. I think it's
better to make the query plan proto definition general/standard.
It was kind of a mistake that we forgot to add DataFrame APIs for users to
specify `ALL | DISTINCT` for set operations. We added the `isAll` flag to the
plan later, because the SQL parser started to support it.
`byName` flag is only supported by Union now, but I think other two set
operators should support it as well.
To summarize by proposal:
1. add a `byName` flag to `SetOperation` proto definition
2. The server translate `Union(..., isAll = false)` to `Distinct(Union)`
3. The server fails if `byName = true` and the set type is intercept or
except.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]