[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

cloud-fan Wed, 26 Sep 2018 03:59:00 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r220516732
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -152,3 +153,51 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
           if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
       }
     }
    +
    +/**
    + * Correctly handle PythonUDF which need access both side of join side by 
changing the new join
    + * type to Cross.
    + */
    +object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with 
PredicateHelper {
    +  def hasPythonUDF(expression: Expression): Boolean = {
    +    expression.collectFirst { case udf: PythonUDF => udf }.isDefined
    --- End diff --
    
    This doesn't matter. We can't evaluate python udf in the join condition, 
and need to pull it out, that's all.
    
    For python udf accessing attributes from only one side, these would be 
pushed down by other rules. If they don't (e.g. user disables filter pushdown 
rule), we need to pull them out here, too. Anyway it's orthogonal to this rule.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

Reply via email to