HyukjinKwon commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516423241
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with
PredicateHelper {
}
}
+ private def canonicalizeDeterministic(u: PythonUDF) = {
Review comment:
The fix itself is logically making sense. But my concern is that we set
this deterministic flag true by default for Python UDFs (and Scala UDFs too).
When users just use UDFs actually without knowing it (which I believe arguably
pretty common), they will get the same answer from UDFs users intended to work
as non-deterministically after this fix.
@gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by
default? - I am reading
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
It's an arbitrary user-defined function so I think it makes sense to have the
most loose condition.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with
PredicateHelper {
}
}
+ private def canonicalizeDeterministic(u: PythonUDF) = {
Review comment:
The fix itself is logically making sense. But my concern is that we set
this deterministic flag true by default for Python UDFs (and Scala UDFs too).
When users just use UDFs actually without knowing it (which I believe is
arguably pretty common), they will get the same answer from UDFs users intended
to work as non-deterministically after this fix.
@gatorsmile and @cloud-fan, actually, shouldn't we set it as `false` by
default? - I am reading
https://github.com/apache/spark/commit/ebc24a9b7fde273ee4912f9bc1c5059703f7b31e.
It's an arbitrary user-defined function so I think it makes sense to have the
most loose condition by default.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]