[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

xuanyuanking Sat, 22 Sep 2018 08:53:25 -0700

Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r219675105
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -995,7 +995,8 @@ class Dataset[T] private[sql](
         // After the cloning, left and right side will have distinct 
expression ids.
         val plan = withPlan(
           Join(logicalPlan, right.logicalPlan, JoinType(joinType), 
Some(joinExprs.expr)))
    -      .queryExecution.analyzed.asInstanceOf[Join]
    +      .queryExecution.analyzed
    +    val joinPlan = plan.collectFirst { case j: Join => j }.get
    --- End diff --
    
    For reviewer, we need this change cause the rule 
`HandlePythonUDFInJoinCondition` will break the assumption about the join plan 
after analyzing will only return Join. After we add the rule of handling python 
udf, we'll add filter or project node on top of Join.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

Reply via email to