sven-weber-db commented on code in PR #55768:
URL: https://github.com/apache/spark/pull/55768#discussion_r3259370773
##########
core/src/main/scala/org/apache/spark/SparkEnv.scala:
##########
@@ -120,6 +123,39 @@ class SparkEnv (
pythonExec: String, workerModule: String, daemonModule: String, envVars:
Map[String, String])
private val pythonWorkers = mutable.HashMap[PythonWorkersKey,
PythonWorkerFactory]()
+ /**
+ * :: Experimental ::
+ * Dispatcher factory to generate UDF worker dispatchers
+ * using the new UDF framework proposed in SPARK-55278
+ */
+ private val udfDispatcherManager: UDFDispatcherManager =
Review Comment:
Good point. On the driver, this would only be required when running a
single-node cluster. I changed the val to be lazily initialized. This way, we
will only acquire the resources that are actually needed. This approach also
follows the current implementation of `pythonWorkers`. Do you think this is
better?
> In general the patten in SparkEnv is that we initialize variables.
Could you elaborate on this statement? The `udfDispatcherManager` is
initialized in the code above. Should we initialize it directly instead of
moving the initialization logic into a separate function?
My reasoning for the existence of `createUDFDispatcherManager()` was that
this approach makes it easier to exchange the implementation with a different
`UDFDispatcherManager`, e.g., depending on some Spark conf value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]