Dear Spark Community, Why Python Data Source API (pyspark.sql.datasource.Datasource) is not using "spark.sql.execution.pyspark.python" config, but UDF do?
Datasource 1) executor always looks for "python3" ignoring "spark.sql.execution.pyspark.python" config 2) so provided dependencies not loaded Using Docker Image on both master/executors spark:4.0.0-scala2.13-java21-python3-ubuntu spark.addArtifact("pyspark_pex_env.pex", file=True) # ijson included spark.conf.set("spark.sql.execution.pyspark.python", "pyspark_pex_env.pex") spark.dataSource.register(MyDataSource) ModuleNotFoundError: No module named 'ijson' 2025-07-24T09:26:21.941789290Z SQLSTATE: 38000 JVM stacktrace: 2025-07-24T09:26:21.941800171Z org.apache.spark.sql.AnalysisException 2025-07-24T09:26:21.941802296Z at org.apache.spark.sql.errors.QueryCompilationErrors$.pythonDataSourceError(QueryCompilationErrors.scala:2206) 2025-07-24T09:26:21.941804593Z at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSourceRunner.receiveFromPython(UserDefinedPythonDataSource.scala:279) 2025-07-24T09:26:21.941806864Z at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSourceRunner.receiveFromPython(UserDefinedPythonDataSource.scala:244) 2025-07-24T09:26:21.941808801Z at org.apache.spark.sql.execution.python.PythonPlannerRunner.runInPython(PythonPlannerRunner.scala:118) 2025-07-24T09:26:21.941824039Z at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSource.createDataSourceInPython(UserDefinedPythonDataSource.scala:61) 2025-07-24T09:26:21.941826618Z at org.apache.spark.sql.execution.datasources.v2.python.PythonDataSourceV2.getOrCreateDataSourceInPython(PythonDataSourceV2.scala:50) 2025-07-24T09:26:21.941828912Z at org.apache.spark.sql.execution.datasources.v2.python.PythonDataSourceV2.inferSchema(PythonDataSourceV2.scala:56) 2025-07-24T09:26:21.941831393Z at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:96) 2025-07-24T09:26:21.941833963Z at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:147) 2025-07-24T09:26:21.941835876Z at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.$anonfun$applyOrElse$1(ResolveDataSource.scala:60) 2025-07-24T09:26:21.941837708Z at scala.Option.flatMap(Option.scala:283) --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org