[GitHub] [spark] amaliujia commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

GitBox Mon, 10 Oct 2022 23:17:25 -0700


amaliujia commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991855071



##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
     }
   }
 
+  test("Test union, except, intersect") {
+    for (isAll <- Seq(true, false)) {
+      val connectPlan = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.except(connectTestRelation, isAll))
+      }
+      val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+      comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+      val connectPlan2 = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.intersect(connectTestRelation, isAll))
+      }
+      val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+      comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+    }
+
+    val connectPlan3 = {
+      import org.apache.spark.sql.connect.dsl.plans._
+      transform(connectTestRelation.union(connectTestRelation))

Review Comment:
   Another reason actually to follow Catalyst: our clients, by best effort, 
should follow existing data frame API to achieve "changing imports and rest of 
code work". 
   
   So it's expected that users will use union().distinct() to construct the 
case above. If we ask clients to smartly identify these two API can be coalesce 
together, that will be too much for clients. Instead clients just convert each 
API to a plan node and connect proto follow Catalyst then it will work.   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amaliujia commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

Reply via email to