[GitHub] [spark] amaliujia commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

GitBox Mon, 10 Oct 2022 22:31:00 -0700


amaliujia commented on code in PR #38166:
URL: https://github.com/apache/spark/pull/38166#discussion_r991828578



##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
     }
   }
 
+  test("Test union, except, intersect") {
+    for (isAll <- Seq(true, false)) {
+      val connectPlan = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.except(connectTestRelation, isAll))
+      }
+      val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+      comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+      val connectPlan2 = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.intersect(connectTestRelation, isAll))
+      }
+      val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+      comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+    }
+
+    val connectPlan3 = {
+      import org.apache.spark.sql.connect.dsl.plans._
+      transform(connectTestRelation.union(connectTestRelation))

Review Comment:
   We are doing Union with isAll because it is in the same message with 
intersect and except and they share the same fields.
   
   However I don't agree with it though. Generally speaking if there is 
something DataFrame decides to do, I think figuring out the context will be 
useful. People working on Spark are smart. They can of course just does union 
with isAll but they decided to not do so. There must be a valid reason and that 
reason might also apply to connect.
   
   So we can choose not to follow what Spark DataFrame does, but I guess we 
should at leas understand the context and then understand our decision.



##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
     }
   }
 
+  test("Test union, except, intersect") {
+    for (isAll <- Seq(true, false)) {
+      val connectPlan = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.except(connectTestRelation, isAll))
+      }
+      val sparkPlan = sparkTestRelation.except(sparkTestRelation, isAll)
+      comparePlans(connectPlan.analyze, sparkPlan.analyze, false)
+
+      val connectPlan2 = {
+        import org.apache.spark.sql.connect.dsl.plans._
+        transform(connectTestRelation.intersect(connectTestRelation, isAll))
+      }
+      val sparkPlan2 = sparkTestRelation.intersect(sparkTestRelation, isAll)
+      comparePlans(connectPlan2.analyze, sparkPlan2.analyze, false)
+    }
+
+    val connectPlan3 = {
+      import org.apache.spark.sql.connect.dsl.plans._
+      transform(connectTestRelation.union(connectTestRelation))

Review Comment:
   We are doing Union with isAll because it is in the same message with 
intersect and except and they share the same fields.
   
   However I don't agree with it though. Generally speaking if there is 
something DataFrame decides to do, I think figuring out the context will be 
useful. People working on Spark are smart. They can of course just does union 
with isAll but they decided to not do so. There must be a valid reason and that 
reason might also apply to connect.
   
   So we can choose not to follow what Spark DataFrame does, but I guess we 
should at least understand the context and then understand our decision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amaliujia commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

Reply via email to