peter-toth commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516528651
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with
PredicateHelper {
}
}
+ private def canonicalizeDeterministic(u: PythonUDF) = {
Review comment:
I see your point but if we can't rely on the deterministic flag then we
have serious issues in other rules as well. Please consider the example in the
description. I think column `d` should be equal to `c` no matter if the UDF is
deterministic or not. The `CollapseProject` rule duplicates the UDF call
(because it is flagged as deterministic) and that's why we have 2 instances
when `ExtractPythonUDFs` starts to run.
The following happens without this PR:
```
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.CollapseProject ===
!Project [a#7, b#8, c#12, c#12 AS d#16] Project [_1#2 AS a#7, _2#3
AS b#8, dummyUDF(_1#2) AS c#12, dummyUDF(_1#2) AS d#16]
!+- Project [a#7, b#8, dummyUDF(a#7) AS c#12] +- LocalRelation [_1#2, _2#3]
! +- Project [_1#2 AS a#7, _2#3 AS b#8]
! +- LocalRelation [_1#2, _2#3]
...
=== Applying Rule org.apache.spark.sql.execution.python.ExtractPythonUDFs ===
!Project [_1#2 AS a#7, _2#3 AS b#8, dummyUDF(_1#2) AS c#12, dummyUDF(_1#2)
AS d#16] Project [_1#2 AS a#7, _2#3 AS b#8, pythonUDF1#22 AS c#12,
pythonUDF1#22 AS d#16]
!+- LocalRelation [_1#2, _2#3]
+- BatchEvalPython [dummyUDF(_1#2), dummyUDF(_1#2)], [pythonUDF0#21,
pythonUDF1#22]
!
+- LocalRelation [_1#2, _2#3]
```
which can mean different values for `c` and `d` if the UDF is
non-deterministic but is flagged deterministic.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]