ulysses-you commented on code in PR #33522:
URL: https://github.com/apache/spark/pull/33522#discussion_r898610674
##########
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##########
@@ -1057,7 +1057,7 @@ class JoinSuite extends QueryTest with SharedSparkSession
with AdaptiveSparkPlan
val pythonEvals = collect(joinNode.get) {
case p: BatchEvalPythonExec => p
}
- assert(pythonEvals.size == 2)
+ assert(pythonEvals.size == 4)
Review Comment:
> Increase complex join key runs from 1 to 2 for BHJ.
We can check if the poll out side can be broadcast so it should not be a
blocker ?
> It may increase the data size of shuffle. For example: the join key is:
concat(col1, col2, col3, col4 ...).
This is really a trade-off, one conservative option may be: We only poll out
the complex keys which the inside attribute is not the final output. So we can
avoid the extra shuffle data as far as possible, for example:
```sql
SELECT a FROM t1 JOIN t2 on t1.a = t2.x + 1;
```
And a config should be introduced for enable or disable easily.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]