eejbyfeldt commented on code in PR #47164:
URL: https://github.com/apache/spark/pull/47164#discussion_r1694707375


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala:
##########
@@ -58,6 +58,8 @@ case class ScalaUDF(
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  override lazy val foldable: Boolean = deterministic && 
children.forall(_.foldable)

Review Comment:
   @HyukjinKwon 
   
   > I think the problem we made ScalaUDF as deterministic by default so far, 
and we will have a lot of problems in existing workloads by executing the UDFs 
in driver side.
   
   To me deterministic seems like the correct default, I would expect most udfs 
to be deterministic.  So chainging the default seems like a more invasive 
change.
   
   The  concerns you raise are around the belief that people are using the 
existing API incorrectly. Do we have some more concrete evidence that this is 
the case? Because this seems like an argument that could be used against a lot 
of potential improvements and to me it does not seems like a rule we would like 
to follow in the long run.
   
   My understanding it also that having a UDF with incorrect value for 
deterministic would already open you up to correctness issues due to other 
optimizer rules? So maybe that leads people to already specifying the correct 
nullability for their udfs.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to