Re: [PR] [SPARK-46670][PYTHON][SQL] Make DataSourceManager self clone-able by separating static and runtime Python Data Sources [spark]

via GitHub Wed, 10 Jan 2024 23:08:10 -0800


HyukjinKwon commented on code in PR #44681:
URL: https://github.com/apache/spark/pull/44681#discussion_r1448379257



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala:
##########
@@ -34,23 +32,29 @@ import org.apache.spark.util.Utils
  * A manager for user-defined data sources. It is used to register and lookup 
data sources by
  * their short names or fully qualified names.
  */
-class DataSourceManager extends Logging {
+class DataSourceManager(
+    initDataSourceBuilders: => Option[
+      Map[String, UserDefinedPythonDataSource]] = None
+   ) extends Logging {
+  import DataSourceManager._
   // Lazy to avoid being invoked during Session initialization.
   // Otherwise, it goes infinite loop, session -> Python runner -> SQLConf -> 
session.
-  private lazy val dataSourceBuilders = {
-    val builders = new ConcurrentHashMap[String, UserDefinedPythonDataSource]()
-    builders.putAll(DataSourceManager.initialDataSourceBuilders.asJava)
-    builders
+  private lazy val staticDataSourceBuilders = initDataSourceBuilders.getOrElse 
{
+    initialDataSourceBuilders

Review Comment:
   Yeah I agree .. but the problem is that 
`UserDefinedPythonDataSourceLookupRunner.runInPython` requires `SQLConf.get` 
that requires `SparkSession` initialization.
   
   So, this initialization of static datasources must happen at least when a 
session is created. So, I here put the static initialization logic into the 
first call of `DataSourceManager` in any session for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46670][PYTHON][SQL] Make DataSourceManager self clone-able by separating static and runtime Python Data Sources [spark]

Reply via email to