Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/12057#discussion_r57980715
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -59,45 +60,43 @@ private[spark] object ExtractPythonUDFs extends
Rule[LogicalPlan] {
case plan: LogicalPlan if plan.resolved =>
// Extract any PythonUDFs from the current operator.
- val udfs = plan.expressions.flatMap(collectEvaluatableUDF)
+ val udfs =
plan.expressions.flatMap(collectEvaluatableUDF).filter(_.resolved)
if (udfs.isEmpty) {
// If there aren't any, we are done.
plan
} else {
- // Pick the UDF we are going to evaluate (TODO: Support evaluating
multiple UDFs at a time)
- // If there is more than one, we will add another evaluation
operator in a subsequent pass.
- udfs.find(_.resolved) match {
- case Some(udf) =>
- var evaluation: EvaluatePython = null
-
- // Rewrite the child that has the input required for the UDF
- val newChildren = plan.children.map { child =>
- // Check to make sure that the UDF can be evaluated with
only the input of this child.
- // Other cases are disallowed as they are ambiguous or would
require a cartesian
- // product.
- if (udf.references.subsetOf(child.outputSet)) {
- evaluation = EvaluatePython(udf, child)
- evaluation
- } else if
(udf.references.intersect(child.outputSet).nonEmpty) {
- sys.error(s"Invalid PythonUDF $udf, requires attributes
from more than one child.")
- } else {
- child
- }
- }
-
- assert(evaluation != null, "Unable to evaluate PythonUDF.
Missing input attributes.")
-
- // Trim away the new UDF value if it was only used for
filtering or something.
- logical.Project(
- plan.output,
- plan.transformExpressions {
- case p: PythonUDF if p.fastEquals(udf) =>
evaluation.resultAttribute
--- End diff --
I guess we lost the `fastEquals` thing since we're now relying on hashcodes
+ hash maps. Presumably this isn't a huge perf. issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]