Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23248#discussion_r239925749
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
    @@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] 
with PredicateHelper {
         expressions.flatMap(collectEvaluableUDFs)
       }
     
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    case plan: LogicalPlan => extract(plan)
    +  def apply(plan: LogicalPlan): LogicalPlan = plan match {
    +    // SPARK-26293: A subquery will be rewritten into join later, and will 
go through this rule
    +    // eventually. Here we skip subquery, as Python UDF only needs to be 
extracted once.
    +    case _: Subquery => plan
    --- End diff --
    
    I see. If it's common to skip Subquery in other rules, I guess it's ok to 
put it in here as well. But it would definitely be helpful to establish some 
kind of guidance, maybe sth like "All optimizer rule should skip Subquery 
because OptimizeSubqueries will execute them anyway"?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to