eejbyfeldt commented on code in PR #47164:
URL: https://github.com/apache/spark/pull/47164#discussion_r1704998527


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala:
##########
@@ -58,6 +58,8 @@ case class ScalaUDF(
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  override lazy val foldable: Boolean = deterministic && 
children.forall(_.foldable)

Review Comment:
   > Or, if the UDF runs something very heavy, it will stop the Catalyst 
optimizer.
   
   To me this argument sounds a bit backwards. The heavier the UDF is the more 
gain it is from executing it once during planning instead of executing it 
multiple times during execution (worst case we are going to execute it once per 
row).
   
   
   > If user defines a UDF that requires something in executors, that'd be a 
problem now as the UDF runs in driver.
   
   Ok, this does sound like somthing that could cause breakage. Not sure how 
common that would be in practice b though. 
   
   
   I changed the PR to make it opt in per UDF to make it foldable. I guess that 
should address most/all of the concerns you brought up with regards of it being 
a breaking change.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to