eejbyfeldt commented on code in PR #47164:
URL: https://github.com/apache/spark/pull/47164#discussion_r1694707375


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala:
##########
@@ -58,6 +58,8 @@ case class ScalaUDF(
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  override lazy val foldable: Boolean = deterministic && 
children.forall(_.foldable)

Review Comment:
   @HyukjinKwon 
   
   > I think the problem we made ScalaUDF as deterministic by default so far, 
and we will have a lot of problems in existing workloads by executing the UDFs 
in driver side.
   
   To me deterministic seems like the correct default, I would expect most udf 
to be deterministic. 
   
   Seems like the concerns you raise are around the belief that people are 
using the existing API incorrectly. Do we have some more concrete evidence that 
this is the case? Because this seems like an argument that could be used 
against a lot of potential improvements and to me it does not seems like a rule 
we would like to follow in the long run.
   
   My understanding it also that having a UDF with incorrect value for 
deterministic would already open you up to correctness issues due to other 
optimizer rules? So maybe that leads people to already specifying the correct 
nullability for their udfs.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to