[GitHub] [spark] peter-toth commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

GitBox Tue, 03 Nov 2020 09:31:13 -0800


peter-toth commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516836683




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       I think @cloud-fan was referring to that if we changed the default to 
non-deterministic then it means that some of the optimization rules would not 
handle those UDF expressions and would leave them untouched. E.g. 
`PushDownPredicates` would not push them down, which could cause performance 
regression.
   
   IMHO, it is the user's responsibility to set the deterministic flag right 
regardless what is the default. And if a UDF is flagged deterministic we should 
do the optimizations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

Reply via email to