[GitHub] [spark] c21 commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

GitBox Sun, 09 Aug 2020 20:38:51 -0700


c21 commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467674611




##########
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##########
@@ -1188,4 +1188,42 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
         classOf[BroadcastNestedLoopJoinExec]))
     }
   }
+
+  test("SPARK-32399: Full outer shuffled hash join") {

Review comment:
       > Do you mean that until 
https://issues.apache.org/jira/browse/SPARK-32577 is fixed we don't have very 
high confidence in this optimization truly producing the same results as 
without it ?
   
   No. I am saying that there are existing tests for comparing BHJ/SHJ/SMJ per 
my last comment link, but it does not trigger SHJ as expected due to wrong 
config. But that can be fixed separately as that is to fix a regression in unit 
test.
   
   To enable/disable SHJ for bulk of test queries is not very easy, as SHJ 
depending on [carefully tuning of threshold 
config](https://github.com/apache/spark/pull/29342#discussion_r467167531), 
there's no way to disable/enable SHJ for a query with true/false config. So to 
validate SHJ for all test queries under `sql-tests`, I need to tune the config 
for every join queries, which I think it would involve too much work. In 
addition, I noticed for newly added features, we rarely use `sql-tests` for 
validation by enabling/disabling feature, but mostly adding unit test. 
Wondering how do you think the added unit test in `JoinSuite.scala`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

Reply via email to