c21 commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467674611
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##########
@@ -1188,4 +1188,42 @@ class JoinSuite extends QueryTest with
SharedSparkSession with AdaptiveSparkPlan
classOf[BroadcastNestedLoopJoinExec]))
}
}
+
+ test("SPARK-32399: Full outer shuffled hash join") {
Review comment:
> Do you mean that until
https://issues.apache.org/jira/browse/SPARK-32577 is fixed we don't have very
high confidence in this optimization truly producing the same results as
without it ?
No. I am saying that there are existing tests for comparing BHJ/SHJ/SMJ per
my last comment link, but it does not trigger SHJ as expected due to wrong
config. But that can be fixed separately as that is to fix a regression in unit
test.
To enable/disable SHJ for bulk of test queries is not very easy, as SHJ
depending on [carefully tuning of threshold
config](https://github.com/apache/spark/pull/29342#discussion_r467167531),
there's no way to disable/enable SHJ for a query with true/false config. So to
validate SHJ for all test queries under `sql-tests`, I need to tune the config
for every join queries, which I think it would involve too much work. In
addition, I noticed for newly added features, we rarely use `sql-tests` for
validation by enabling/disabling feature, but mostly adding unit test.
Wondering how do you think the added unit test in `JoinSuite.scala`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]