allisonwang-db commented on code in PR #44681:
URL: https://github.com/apache/spark/pull/44681#discussion_r1449709548
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala:
##########
@@ -34,23 +32,29 @@ import org.apache.spark.util.Utils
* A manager for user-defined data sources. It is used to register and lookup
data sources by
* their short names or fully qualified names.
*/
-class DataSourceManager extends Logging {
+class DataSourceManager(
+ initDataSourceBuilders: => Option[
+ Map[String, UserDefinedPythonDataSource]] = None
+ ) extends Logging {
+ import DataSourceManager._
// Lazy to avoid being invoked during Session initialization.
// Otherwise, it goes infinite loop, session -> Python runner -> SQLConf ->
session.
- private lazy val dataSourceBuilders = {
- val builders = new ConcurrentHashMap[String, UserDefinedPythonDataSource]()
- builders.putAll(DataSourceManager.initialDataSourceBuilders.asJava)
- builders
+ private lazy val staticDataSourceBuilders = initDataSourceBuilders.getOrElse
{
+ initialDataSourceBuilders
Review Comment:
Ah because of these two configs:
```
val simplifiedTraceback: Boolean = SQLConf.get.pysparkSimplifiedTraceback
val workerMemoryMb = SQLConf.get.pythonPlannerExecMemory
```
I think instead of accessing the SQLConf here, we should pass them as
parameters to this method `runInPython` to avoid this initialization issue.
Maybe we can add a TODO for a follow up PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]