[GitHub] [spark] srowen commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

GitBox Tue, 03 Nov 2020 06:59:49 -0800


srowen commented on a change in pull request #30203:
URL: https://github.com/apache/spark/pull/30203#discussion_r516730762




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
##########
@@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with 
PredicateHelper {
     }
   }
 
+  private def canonicalizeDeterministic(u: PythonUDF) = {

Review comment:
       Indeed, the example in the PR should have c and d equal no matter what, 
ideally, as it shouldn't be evaluated multiple times. But then, being 
deterministic doesn't matter; how would changing the default be the right fix? 
I don't doubt it should be 'fixed' so that even non-deterministic UDFs aren't 
surprisingly reevaluated. Is the point that the current handling for 
non-deterministic UDFs slower, even though it evaluates them just once by 
design? that seems weird.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls

Reply via email to