Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21529#discussion_r195100665
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -882,4 +882,15 @@ class JoinSuite extends QueryTest with
SharedSQLContext {
checkAnswer(df, Row(3, 8, 7, 2) :: Row(3, 8, 4, 2) :: Nil)
}
}
+
+ test("SPARK-24495: EnsureRequirements can return wrong plan when reusing
the same key in join") {
+ withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "1",
+ SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false",
+ SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+ val df1 = spark.range(0, 100, 1, 2)
+ val df2 = spark.range(100).select($"id".as("b1"), (- $"id").as("b2"))
+ val res = df1.join(df2, $"id" === $"b1" && $"id" === $"b2")
--- End diff --
one difference between this test and the code in JIRA ticket is, the code
in JIRA ticket has a Project above join, to trigger the double transformation
issue. We should add a Project and make sure this test **does** fail without
this patch.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]