eejbyfeldt commented on code in PR #47164: URL: https://github.com/apache/spark/pull/47164#discussion_r1704998527
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala: ########## @@ -58,6 +58,8 @@ case class ScalaUDF( override lazy val deterministic: Boolean = udfDeterministic && children.forall(_.deterministic) + override lazy val foldable: Boolean = deterministic && children.forall(_.foldable) Review Comment: > Or, if the UDF runs something very heavy, it will stop the Catalyst optimizer. To me this argument sounds a bit backwards. The heavier the UDF is the more gain it is from executing it once during planning instead of executing it multiple times during execution (worst case we are going to execute it once per row). > If user defines a UDF that requires something in executors, that'd be a problem now as the UDF runs in driver. Ok, this does sound like somthing that could cause breakage. Not sure how common that would be in practice b though. I changed the PR to make it opt in per UDF to make it foldable. I guess that should address most/all of the concerns you brought up with regards of it being a breaking change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
