[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

xuanyuanking Tue, 27 Nov 2018 08:30:25 -0800

Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23153#discussion_r236743539
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
     }
     
     /**
    - * PythonUDF in join condition can not be evaluated, this rule will detect 
the PythonUDF
    - * and pull them out from join condition. For python udf accessing 
attributes from only one side,
    - * they are pushed down by operation push down rules. If not (e.g. user 
disables filter push
    - * down rules), we need to pull them out in this rule too.
    + * PythonUDF in join condition can't be evaluated if it refers to 
attributes from both join sides.
    + * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable 
PythonUDF and pull them
    + * out from join condition.
      */
     object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with 
PredicateHelper {
    -  def hasPythonUDF(expression: Expression): Boolean = {
    -    expression.collectFirst { case udf: PythonUDF => udf }.isDefined
    +
    +  private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean 
= {
    +    expr.find { e =>
    +      PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && 
!canEvaluate(e, j.right)
    +    }.isDefined
       }
     
       override def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    case j @ Join(_, _, joinType, condition)
    -        if condition.isDefined && hasPythonUDF(condition.get) =>
    +    case j @ Join(_, _, joinType, Some(cond)) if 
hasUnevaluablePythonUDF(cond, j) =>
    --- End diff --
    
    Followed by the rule changes, we need modify the suites in 
`PullOutPythonUDFInJoinConditionSuite`, the suites should also construct the 
dummy python udf from both side.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

Reply via email to