Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/23057#discussion_r234409212
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
---
@@ -1280,4 +1281,34 @@ class SubquerySuite extends QueryTest with
SharedSQLContext {
assert(subqueries.length == 1)
}
}
+
+ test("SPARK-26078: deduplicate fake self joins for IN subqueries") {
+ withTempView("a", "b") {
+ val a =
spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row("a", 2), Row("b",
1))),
+ StructType(Seq(StructField("id", StringType), StructField("num",
IntegerType))))
+ val b =
spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row("a", 2), Row("b",
1))),
+ StructType(Seq(StructField("id", StringType), StructField("num",
IntegerType))))
--- End diff --
Two schema is the same. We can define it just once?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]