szehon-ho commented on code in PR #41614:
URL: https://github.com/apache/spark/pull/41614#discussion_r1231247766
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2175,6 +2175,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val ENABLE_BUILD_SIDE_OUTER_SHUFFLED_HASH_JOIN_CODEGEN =
Review Comment:
I wasn't entirely sure if this is the right approach, but the existing flag
was a bit explicit in FULL OUTER JOIN, and in the discussion like
https://github.com/apache/spark/pull/41398#discussion_r1214000680 build-side
outer join is taken to mean, if only one side is OUTER.
##########
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##########
@@ -1313,78 +1313,82 @@ class JoinSuite extends QueryTest with
SharedSparkSession with AdaptiveSparkPlan
test("SPARK-36612: Support left outer join build left or right outer join
build right in " +
"shuffled hash join") {
- val inputDFs = Seq(
- // Test unique join key
- (spark.range(10).selectExpr("id as k1"),
- spark.range(30).selectExpr("id as k2"),
- $"k1" === $"k2"),
- // Test non-unique join key
- (spark.range(10).selectExpr("id % 5 as k1"),
- spark.range(30).selectExpr("id % 5 as k2"),
- $"k1" === $"k2"),
- // Test empty build side
- (spark.range(10).selectExpr("id as k1").filter("k1 < -1"),
- spark.range(30).selectExpr("id as k2"),
- $"k1" === $"k2"),
- // Test empty stream side
- (spark.range(10).selectExpr("id as k1"),
- spark.range(30).selectExpr("id as k2").filter("k2 < -1"),
- $"k1" === $"k2"),
- // Test empty build and stream side
- (spark.range(10).selectExpr("id as k1").filter("k1 < -1"),
- spark.range(30).selectExpr("id as k2").filter("k2 < -1"),
- $"k1" === $"k2"),
- // Test string join key
- (spark.range(10).selectExpr("cast(id * 3 as string) as k1"),
- spark.range(30).selectExpr("cast(id as string) as k2"),
- $"k1" === $"k2"),
- // Test build side at right
- (spark.range(30).selectExpr("cast(id / 3 as string) as k1"),
- spark.range(10).selectExpr("cast(id as string) as k2"),
- $"k1" === $"k2"),
- // Test NULL join key
- (spark.range(10).map(i => if (i % 2 == 0) i else null).selectExpr("value
as k1"),
- spark.range(30).map(i => if (i % 4 == 0) i else
null).selectExpr("value as k2"),
- $"k1" === $"k2"),
- (spark.range(10).map(i => if (i % 3 == 0) i else null).selectExpr("value
as k1"),
- spark.range(30).map(i => if (i % 5 == 0) i else
null).selectExpr("value as k2"),
- $"k1" === $"k2"),
- // Test multiple join keys
- (spark.range(10).map(i => if (i % 2 == 0) i else null).selectExpr(
- "value as k1", "cast(value % 5 as short) as k2", "cast(value * 3 as
long) as k3"),
- spark.range(30).map(i => if (i % 3 == 0) i else null).selectExpr(
- "value as k4", "cast(value % 5 as short) as k5", "cast(value * 3 as
long) as k6"),
- $"k1" === $"k4" && $"k2" === $"k5" && $"k3" === $"k6")
- )
+ withSQLConf(SQLConf.ENABLE_BUILD_SIDE_OUTER_SHUFFLED_HASH_JOIN_CODEGEN.key
-> "false") {
Review Comment:
Note: this is not strictly necessary, but wanted to keep some coverage of
the non-codegen codepath.
However, it is true that the "SPARK-32399: Full outer shuffled hash join"
test do this, leaving the non-codegen codepath actually untested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]