This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new 40e65d33482 [SPARK-44981][PYTHON][CONNECT][FOLLOW-UP] Explicitly pass runtime configurations only 40e65d33482 is described below commit 40e65d334822ca88492ecbc4197386322be08fa3 Author: Hyukjin Kwon <gurwls...@apache.org> AuthorDate: Tue Aug 29 22:45:00 2023 +0900 [SPARK-44981][PYTHON][CONNECT][FOLLOW-UP] Explicitly pass runtime configurations only ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/42694 that only allows to pass runtime configurations. ### Why are the changes needed? Excluding static SQL configurations cannot exclude core configurations. For example, if you pass `spark.jars` with `local-cluster` mode, it shows unneccesary warnings as below: ```bash ./bin/pyspark --remote "local-cluster[1,2,1024]" ``` it shows warnings as below: ``` 23/08/29 16:58:08 ERROR ErrorUtils: Spark Connect RPC error during: config. UserId: hyukjin.kwon. SessionId: 5c331d52-bf65-4f1c-9416-899e00d4a7d9. org.apache.spark.sql.AnalysisException: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.jars". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:3233) at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:166) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40) at org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751) at org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346) at org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860) at org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) /Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/session.py:186: UserWarning: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.jars". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. warnings.warn(str(e)) 23/08/29 16:58:08 ERROR ErrorUtils: Spark Connect RPC error during: config. UserId: hyukjin.kwon. SessionId: 5c331d52-bf65-4f1c-9416-899e00d4a7d9. org.apache.spark.sql.AnalysisException: [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.jars". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:3233) at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:166) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65) at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40) at org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751) at org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346) at org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860) at org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? No, the original change has not been released out yet. ### How was this patch tested? Manually tested as described above. ### Was this patch authored or co-authored using generative AI tooling? No Closes #42718 from HyukjinKwon/SPARK-44981-followup. Authored-by: Hyukjin Kwon <gurwls...@apache.org> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> (cherry picked from commit 281f174304a5b1d9a146502dfdfd000d15924327) Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/sql/connect/session.py | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index 628eae20511..1307c8bdd84 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -884,12 +884,13 @@ class SparkSession: SparkContext.getOrCreate(create_conf(loadDefaults=True, _jvm=SparkContext._jvm)) ) - # Lastly remove all static configurations that are not allowed to set in the regular - # Spark Connect session. - jvm = SparkContext._jvm - utl = jvm.org.apache.spark.sql.api.python.PythonSQLUtils # type: ignore[union-attr] - for conf_set in utl.listStaticSQLConfigs(): - opts.pop(conf_set._1(), None) + # Lastly only keep runtime configurations because other configurations are + # disallowed to set in the regular Spark Connect session. + utl = SparkContext._jvm.PythonSQLUtils # type: ignore[union-attr] + runtime_conf_keys = [c._1() for c in utl.listRuntimeSQLConfigs()] + new_opts = {k: opts[k] for k in opts if k in runtime_conf_keys} + opts.clear() + opts.update(new_opts) finally: if origin_remote is not None: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org