Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/23248#discussion_r239565253
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan]
with PredicateHelper {
expressions.flatMap(collectEvaluableUDFs)
}
- def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
- case plan: LogicalPlan => extract(plan)
+ def apply(plan: LogicalPlan): LogicalPlan = plan match {
+ // SPARK-26293: A subquery will be rewritten into join later, and will
go through this rule
+ // eventually. Here we skip subquery, as Python UDF only needs to be
extracted once.
+ case _: Subquery => plan
--- End diff --
Personally I found it a bit confusing when two seeming unrelated things are
put together (Subquery and ExtractPythonUDFs).
I wonder if it's sufficient to make ExtractPythonUDFs idempotent?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]