juliuszsompolski commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1278299618
##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2547,4 +2547,18 @@ package object config {
.version("3.5.0")
.booleanConf
.createWithDefault(false)
+
+ private[spark] val CONNECT_SCALA_UDF_STUB_CLASSES =
+ ConfigBuilder("spark.connect.scalaUdf.stubClasses")
+ .internal()
+ .doc("""
+ |Comma-separated list of binary names of classes/packages that
should be stubbed during
+ |the Scala UDF serde and execution if not found on the server
classpath.
+ |An empty list effectively disables stubbing for all missing classes.
+ |By default, the server stubs classes from the Scala client package.
+ |""".stripMargin)
Review Comment:
So by default we will be stubbing if some Spark Connect client code is
pulled into the UDF, but not if the serialization pulls some other class,
unrelated to the client and not needed by the UDF, but just referenced in the
contained class in a way that will make it pulled in?
In that case the user would also get an error about ClassNotFound?
Do we in that case want the user to add that using an addArtifact, even
though it might be unclear to the user why is that relevant to the UDF?
What are the disadvantages of just stubbing everything?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]