Github user cloud-fan commented on a diff in the pull request:
    --- Diff: 
    @@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] 
with PredicateHelper {
    -  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    case plan: LogicalPlan => extract(plan)
    +  def apply(plan: LogicalPlan): LogicalPlan = plan match {
    +    // SPARK-26293: A subquery will be rewritten into join later, and will 
go through this rule
    +    // eventually. Here we skip subquery, as Python UDF only needs to be 
extracted once.
    +    case _: Subquery => plan
    --- End diff --
    I agree it's a bit confusing, but that's how `Subquery` is designed to 
work. See how `RemoveRedundantAliases` catches `Subquery`.
    It's sufficient to make `ExtractPythonUDFs` idempotent, skip `Subquery` is 
just for double safe, and may have a little bit perf improvement, since this 
rule will be run less.
    In general, I think we should skip `Subquery` here. This is why we create 
`Subquery`: we expect rules that don't want to be executed on subquery to skip 
it. I'll check more rules and see if they need to skip `Subquery` later.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to