eejbyfeldt commented on code in PR #47164: URL: https://github.com/apache/spark/pull/47164#discussion_r1694707375
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala: ########## @@ -58,6 +58,8 @@ case class ScalaUDF( override lazy val deterministic: Boolean = udfDeterministic && children.forall(_.deterministic) + override lazy val foldable: Boolean = deterministic && children.forall(_.foldable) Review Comment: @HyukjinKwon > I think the problem we made ScalaUDF as deterministic by default so far, and we will have a lot of problems in existing workloads by executing the UDFs in driver side. To me deterministic seems like the correct default, I would expect most udfs to be deterministic. So chainging the default seems like a more invasive change. The concerns you raise are around the belief that people are using the existing API incorrectly. Do we have some more concrete evidence that this is the case? Because this seems like an argument that could be used against a lot of potential improvements and to me it does not seems like a rule we would like to follow in the long run. My understanding it also that having a UDF with incorrect value for deterministic would already open you up to correctness issues due to other optimizer rules? So maybe that leads people to already specifying the correct nullability for their udfs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
