Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22326
Some thoughts:
1. This rule is a little tricky as it only handles python udf accessing
attributes from both side. If it only accesses one side, we assume it can be
pushed down later. Generally we should not depend on optimizer rules in an
analyzer rule. My proposal is: move this rule to optimizer, as the last batch
(but before the `UpdateAttributeReferences` batch). Since we apply this rule
after filter pushdown, we can simply pull out any python udf in join condition.
Also add this rule to `Optimizer.nonExcludableRules`, since this is a special
optimizer rule that can't be turned off.
2. About cross join. I think we don't need to take care of it. My only
concern is we have to keep the behavior same as before.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]