[GitHub] [spark] peter-toth commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

GitBox Tue, 03 Nov 2020 09:32:14 -0800


peter-toth commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516836683




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       I think @cloud-fan was referring to that if we changed the default to 
non-deterministic then some of the optimization rules would not handle those 
UDF expressions and would leave them untouched. E.g. `PushDownPredicates` would 
not push them down, which could cause performance regression.
   
   IMHO, it is the user's responsibility to set the deterministic flag right 
regardless what is the default. And if a UDF is flagged deterministic we should 
do the optimizations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

Reply via email to