[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r774544570 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: Made a PR at https://github.com/apache/spark/pull/35001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769248275 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: If we can't make it in Spark 3.3, I think maybe it's just safer to revert https://github.com/apache/spark/pull/34757 https://github.com/apache/spark/pull/34732 and https://github.com/apache/spark/pull/34559 for now because each patch here will introduce either: 1. Unexpected configuration propagation of static SQL configuration, or 2. Too much warnings Separately, I still feel https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 is inefficient. We don't know which configurations don't take effect, or why it keeps complaining (see the example above) for which configuration. We should probably at least print out the keys or lower the level of log. cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769248275 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: If we can't make it in Spark 3.3, I think maybe it's just safer to revert https://github.com/apache/spark/pull/34757 https://github.com/apache/spark/pull/34732 and https://github.com/apache/spark/pull/34559 for now because each patch here will introduce either: 1. Unexpected configuration propagation of static SQL configuration, or 2. Too much warnings Separately, I still feel https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 is inefficient. We don't know which configurations don't take affect, or why it keeps complaining (see the example above) for which configuration. We should probably at least print out the keys or lower the level of log. cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769250270 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: @xinrong-databricks actually this is more Python side codes. Are you interested in creating a followup? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769245766 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: @AngersZh, this actually shows a lot of new warnings (see also https://github.com/apache/spark/pull/34893). Another reproducer: ```bash ./bin/spark-shell --conf spark.executor.memory=8g --conf spark.driver.memory=8g ``` ```python >>> from pyspark.sql.functions import udf >>> udf(lambda x: x)("a") 21/12/15 14:03:15 WARN SparkSession: Using an existing SparkSession; the static sql configurations will not take effect. Column<'(a)'> ``` There are more places to fix like this: ```python ml/util.py:self._sparkSession = SparkSession.builder.getOrCreate() sql/column.py:spark = SparkSession.builder.getOrCreate() sql/context.py:sparkSession = SparkSession.builder.getOrCreate() sql/readwriter.py:spark = SparkSession.builder.getOrCreate() sql/readwriter.py:spark = SparkSession.builder.getOrCreate() sql/session.py:return SparkSession.builder.getOrCreate() sql/session.py:return SparkSession.builder.getOrCreate() sql/streaming.py:spark = SparkSession.builder.getOrCreate() sql/streaming.py:spark = SparkSession.builder.getOrCreate() sql/udf.py:spark = SparkSession.builder.getOrCreate() ``` ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: Would you mind fixing these please? If we can't make it until Spark 3.3, I think maybe it's just safer to revert https://github.com/apache/spark/pull/34757 https://github.com/apache/spark/pull/34732 and https://github.com/apache/spark/pull/34559 for now because each patch here will introduce either: 1. Unexpected configuration propagation of static SQL configuration, or 2. Too much warnings Separately, I still feel https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 is inefficient. We don't know which configurations don't take affect, or why it keeps complaining (see the example above) for which configuration. We should probably at least print out the keys or lower the level of log. cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: Would you mind fixing these please? If we can't make it in Spark 3.3, I think maybe it's just safer to revert https://github.com/apache/spark/pull/34757 https://github.com/apache/spark/pull/34732 and https://github.com/apache/spark/pull/34559 for now because each patch here will introduce either: 1. Unexpected configuration propagation of static SQL configuration, or 2. Too much warnings Separately, I still feel https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 is inefficient. We don't know which configurations don't take affect, or why it keeps complaining (see the example above) for which configuration. We should probably at least print out the keys or lower the level of log. cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769248275 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: Would you mind fixing these please? If we can't make it by Spark 3.2, I think maybe it's just safer to revert https://github.com/apache/spark/pull/34757 https://github.com/apache/spark/pull/34732 and https://github.com/apache/spark/pull/34559 for now because each patch here will introduce either: 1. Unexpected configuration propagation of static SQL configuration, or 2. Too much warnings Separately, I still feel https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 is inefficient. We don't know which configurations don't take affect, or why it keeps complaining (see the example above) for which configuration. We should probably at least print out the keys or lower the level of log. cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r769245766 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +306,15 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings( Review comment: @AngersZh, this actually shows a lot of new warnings (see also https://github.com/apache/spark/pull/34893). Another reproducer: ```bash ./bin/spark-shell --conf spark.executor.memory=8g --conf spark.driver.memory=8g ``` ```python >>> from pyspark.sql.functions import udf >>> udf(lambda x: x)("a") 21/12/15 14:03:15 WARN SparkSession: Using an existing SparkSession; the static sql configurations will not take effect. Column<'(a)'> ``` There are too many places like this: ```python ml/util.py:self._sparkSession = SparkSession.builder.getOrCreate() sql/column.py:spark = SparkSession.builder.getOrCreate() sql/context.py:sparkSession = SparkSession.builder.getOrCreate() sql/readwriter.py:spark = SparkSession.builder.getOrCreate() sql/readwriter.py:spark = SparkSession.builder.getOrCreate() sql/session.py:return SparkSession.builder.getOrCreate() sql/session.py:return SparkSession.builder.getOrCreate() sql/streaming.py:spark = SparkSession.builder.getOrCreate() sql/streaming.py:spark = SparkSession.builder.getOrCreate() sql/udf.py:spark = SparkSession.builder.getOrCreate() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760670200 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1074,6 +1058,28 @@ object SparkSession extends Logging { throw new IllegalStateException("No active or default Spark session found"))) } + /** + * Apply modifiable settings to an existed SparkSession. This method are used + * both in scala and Pyspark, so put this under SparkSession object. Review comment: ```suggestion * both in Scala and Python, so put this under [[SparkSession]] object. ``` ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1074,6 +1058,28 @@ object SparkSession extends Logging { throw new IllegalStateException("No active or default Spark session found"))) } + /** + * Apply modifiable settings to an existed SparkSession. This method are used Review comment: ```suggestion * Apply modifiable settings to an existing [[SparkSession]]. This method are used ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760669979 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,11 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) else: jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options) Review comment: Shall we add a short comment here that this is the case when we can set static configurations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760669444 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,11 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) else: jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options) Review comment: What about this case? shouldn't we still filter static confs out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760660975 ## File path: python/pyspark/sql/session.py ## @@ -277,8 +277,10 @@ def getOrCreate(self) -> "SparkSession": # Do not update `SparkConf` for existing `SparkContext`, as it's shared # by all sessions. session = SparkSession(sc, options=self._options) -for key, value in self._options.items(): - session._jsparkSession.sessionState().conf().setConfString(key, value) +else: Review comment: oops i misread -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760660509 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,13 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +if options is not None: + self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) Review comment: or `Dict[str, str]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760660448 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,13 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +if options is not None: + self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) Review comment: should fix the type hints above ` Optional[Dict[str, Any]] =` -> `Dict[str, Any] =` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760660328 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,13 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +if options is not None: + self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) Review comment: ```suggestion self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options) ``` Can `options` be `None` now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760659914 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1074,6 +1058,24 @@ object SparkSession extends Logging { throw new IllegalStateException("No active or default Spark session found"))) } + def applyModifiableSettings( Review comment: Can we also add some docs that this is being invoked from PySpark, and that's why it exists in SparkSession object? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760659711 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1074,6 +1058,24 @@ object SparkSession extends Logging { throw new IllegalStateException("No active or default Spark session found"))) } + def applyModifiableSettings( Review comment: `private[sql]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r759995944 ## File path: python/pyspark/sql/session.py ## @@ -287,6 +289,29 @@ def getOrCreate(self) -> "SparkSession": _instantiatedSession: ClassVar[Optional["SparkSession"]] = None _activeSession: ClassVar[Optional["SparkSession"]] = None +def applyModifiableSetting(self, session: JavaObject, options: Dict[str, Any]) -> None: Review comment: Oh I see. Yeah, I think we can move that method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r759992352 ## File path: python/pyspark/sql/session.py ## @@ -277,8 +277,10 @@ def getOrCreate(self) -> "SparkSession": # Do not update `SparkConf` for existing `SparkContext`, as it's shared # by all sessions. session = SparkSession(sc, options=self._options) -for key, value in self._options.items(): - session._jsparkSession.sessionState().conf().setConfString(key, value) +else: Review comment: Hm, in Scala side, `applyModifiableSettings` is always invoked when the default Spark session is not found, and `SparkContext` is already running. Is this matched with Scala side? Seems not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r759991137 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +329,13 @@ def __init__( and not self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped() ): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() +if options is not None: +self.applyModifiableSetting(jsparkSession, options) else: jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options) +else: +if options is not None: +self.applyModifiableSetting(jsparkSession, options) Review comment: Hm, in this case, there's an existing `jsparkSession`. Do we still need to check the configurations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r759989446 ## File path: python/pyspark/sql/session.py ## @@ -287,6 +289,29 @@ def getOrCreate(self) -> "SparkSession": _instantiatedSession: ClassVar[Optional["SparkSession"]] = None _activeSession: ClassVar[Optional["SparkSession"]] = None +def applyModifiableSetting(self, session: JavaObject, options: Dict[str, Any]) -> None: Review comment: Can we call JVM's `SparkSession.applyModifiableSetting` after making it `private[sql]`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org