[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-23 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r774544570



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   Made a PR at https://github.com/apache/spark/pull/35001  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769248275



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   If we can't make it in Spark 3.3, I think maybe it's just safer to 
revert https://github.com/apache/spark/pull/34757 
https://github.com/apache/spark/pull/34732 and 
https://github.com/apache/spark/pull/34559 for now because each patch here will 
introduce either:
   1. Unexpected configuration propagation of static SQL configuration, or
   2. Too much warnings
   
   Separately, I still feel 
https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 
is inefficient. We don't know which configurations don't take effect, or why it 
keeps complaining (see the example above) for which configuration. We should 
probably at least print out the keys or lower the level of log.
   
   cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769248275



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   If we can't make it in Spark 3.3, I think maybe it's just safer to 
revert https://github.com/apache/spark/pull/34757 
https://github.com/apache/spark/pull/34732 and 
https://github.com/apache/spark/pull/34559 for now because each patch here will 
introduce either:
   1. Unexpected configuration propagation of static SQL configuration, or
   2. Too much warnings
   
   Separately, I still feel 
https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 
is inefficient. We don't know which configurations don't take affect, or why it 
keeps complaining (see the example above) for which configuration. We should 
probably at least print out the keys or lower the level of log.
   
   cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769250270



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   @xinrong-databricks actually this is more Python side codes. Are you 
interested in creating a followup?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769245766



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   @AngersZh, this actually shows a lot of new warnings (see also 
https://github.com/apache/spark/pull/34893). Another reproducer:
   
   ```bash
   ./bin/spark-shell --conf spark.executor.memory=8g --conf 
spark.driver.memory=8g
   ```
   
   ```python
   >>> from pyspark.sql.functions import udf
   >>> udf(lambda x: x)("a")
   21/12/15 14:03:15 WARN SparkSession: Using an existing SparkSession; the 
static sql configurations will not take effect.
   Column<'(a)'>
   ```
   
   There are more places to fix like this:
   
   ```python
   ml/util.py:self._sparkSession = 
SparkSession.builder.getOrCreate()
   sql/column.py:spark = SparkSession.builder.getOrCreate()
   sql/context.py:sparkSession = SparkSession.builder.getOrCreate()
   sql/readwriter.py:spark = SparkSession.builder.getOrCreate()
   sql/readwriter.py:spark = SparkSession.builder.getOrCreate()
   sql/session.py:return SparkSession.builder.getOrCreate()
   sql/session.py:return SparkSession.builder.getOrCreate()
   sql/streaming.py:spark = SparkSession.builder.getOrCreate()
   sql/streaming.py:spark = SparkSession.builder.getOrCreate()
   sql/udf.py:spark = SparkSession.builder.getOrCreate()
   ```

##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   Would you mind fixing these please?
   
   If we can't make it until Spark 3.3, I think maybe it's just safer to revert 
https://github.com/apache/spark/pull/34757 
https://github.com/apache/spark/pull/34732 and 
https://github.com/apache/spark/pull/34559 for now because each patch here will 
introduce either:
   1. Unexpected configuration propagation of static SQL configuration, or
   2. Too much warnings
   
   Separately, I still feel 
https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 
is inefficient. We don't know which configurations don't take affect, or why it 
keeps complaining (see the example above) for which configuration. We should 
probably at least print out the keys or lower the level of log.
   
   cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI

##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   Would you mind fixing these please?
   
   If we can't make it in Spark 3.3, I think maybe it's just safer to revert 
https://github.com/apache/spark/pull/34757 
https://github.com/apache/spark/pull/34732 and 
https://github.com/apache/spark/pull/34559 for now because each patch here will 
introduce either:
   1. Unexpected configuration propagation of static SQL configuration, or
   2. Too much warnings
   
   Separately, I still feel 
https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 
is inefficient. We don't know which configurations don't take affect, or why it 
keeps complaining (see the example above) for which configuration. We should 
probably at least print out the keys or lower the level of log.
   
   cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769248275



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   Would you mind fixing these please?
   
   If we can't make it by Spark 3.2, I think maybe it's just safer to revert 
https://github.com/apache/spark/pull/34757 
https://github.com/apache/spark/pull/34732 and 
https://github.com/apache/spark/pull/34559 for now because each patch here will 
introduce either:
   1. Unexpected configuration propagation of static SQL configuration, or
   2. Too much warnings
   
   Separately, I still feel 
https://github.com/apache/spark/commit/8424f552293677717da7411ed43e68e73aa7f0d6 
is inefficient. We don't know which configurations don't take affect, or why it 
keeps complaining (see the example above) for which configuration. We should 
probably at least print out the keys or lower the level of log.
   
   cc @AngersZh @yaooqinn @maropu @dongjoon-hyun FYI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-14 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r769245766



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +306,15 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+getattr(getattr(self._jvm, "SparkSession$"), 
"MODULE$").applyModifiableSettings(

Review comment:
   @AngersZh, this actually shows a lot of new warnings (see also 
https://github.com/apache/spark/pull/34893). Another reproducer:
   
   ```bash
   ./bin/spark-shell --conf spark.executor.memory=8g --conf 
spark.driver.memory=8g
   ```
   
   ```python
   >>> from pyspark.sql.functions import udf
   >>> udf(lambda x: x)("a")
   21/12/15 14:03:15 WARN SparkSession: Using an existing SparkSession; the 
static sql configurations will not take effect.
   Column<'(a)'>
   ```
   
   There are too many places like this:
   
   ```python
   ml/util.py:self._sparkSession = 
SparkSession.builder.getOrCreate()
   sql/column.py:spark = SparkSession.builder.getOrCreate()
   sql/context.py:sparkSession = SparkSession.builder.getOrCreate()
   sql/readwriter.py:spark = SparkSession.builder.getOrCreate()
   sql/readwriter.py:spark = SparkSession.builder.getOrCreate()
   sql/session.py:return SparkSession.builder.getOrCreate()
   sql/session.py:return SparkSession.builder.getOrCreate()
   sql/streaming.py:spark = SparkSession.builder.getOrCreate()
   sql/streaming.py:spark = SparkSession.builder.getOrCreate()
   sql/udf.py:spark = SparkSession.builder.getOrCreate()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760670200



##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1074,6 +1058,28 @@ object SparkSession extends Logging {
   throw new IllegalStateException("No active or default Spark session 
found")))
   }
 
+  /**
+   * Apply modifiable settings to an existed SparkSession. This method are used
+   * both in scala and Pyspark, so put this under SparkSession object.

Review comment:
   ```suggestion
  * both in Scala and Python, so put this under [[SparkSession]] object.
   ```

##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1074,6 +1058,28 @@ object SparkSession extends Logging {
   throw new IllegalStateException("No active or default Spark session 
found")))
   }
 
+  /**
+   * Apply modifiable settings to an existed SparkSession. This method are used

Review comment:
   ```suggestion
  * Apply modifiable settings to an existing [[SparkSession]]. This method 
are used
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760669979



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +304,11 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+self._jvm.SparkSession.applyModifiableSettings(jsparkSession, 
options)
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)

Review comment:
   Shall we add a short comment here that this is the case when we can set 
static configurations




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760669444



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +304,11 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+self._jvm.SparkSession.applyModifiableSettings(jsparkSession, 
options)
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)

Review comment:
   What about this case? shouldn't we still filter static confs out?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760660975



##
File path: python/pyspark/sql/session.py
##
@@ -277,8 +277,10 @@ def getOrCreate(self) -> "SparkSession":
 # Do not update `SparkConf` for existing `SparkContext`, 
as it's shared
 # by all sessions.
 session = SparkSession(sc, options=self._options)
-for key, value in self._options.items():
-
session._jsparkSession.sessionState().conf().setConfString(key, value)
+else:

Review comment:
   oops i misread




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760660509



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +304,13 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+if options is not None:
+
self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options)

Review comment:
   or `Dict[str, str]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760660448



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +304,13 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+if options is not None:
+
self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options)

Review comment:
   should fix the type hints above ` Optional[Dict[str, Any]] =` -> 
`Dict[str, Any] =`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760660328



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +304,13 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+if options is not None:
+
self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options)

Review comment:
   ```suggestion
   
self._jvm.SparkSession.applyModifiableSettings(jsparkSession, options)
   ```
   
   Can `options` be `None` now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760659914



##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1074,6 +1058,24 @@ object SparkSession extends Logging {
   throw new IllegalStateException("No active or default Spark session 
found")))
   }
 
+  def applyModifiableSettings(

Review comment:
   Can we also add some docs that this is being invoked from PySpark, and 
that's why it exists in SparkSession object?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r760659711



##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1074,6 +1058,24 @@ object SparkSession extends Logging {
   throw new IllegalStateException("No active or default Spark session 
found")))
   }
 
+  def applyModifiableSettings(

Review comment:
   `private[sql]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r759995944



##
File path: python/pyspark/sql/session.py
##
@@ -287,6 +289,29 @@ def getOrCreate(self) -> "SparkSession":
 _instantiatedSession: ClassVar[Optional["SparkSession"]] = None
 _activeSession: ClassVar[Optional["SparkSession"]] = None
 
+def applyModifiableSetting(self, session: JavaObject, options: Dict[str, 
Any]) -> None:

Review comment:
   Oh I see. Yeah, I think we can move that method  
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r759992352



##
File path: python/pyspark/sql/session.py
##
@@ -277,8 +277,10 @@ def getOrCreate(self) -> "SparkSession":
 # Do not update `SparkConf` for existing `SparkContext`, 
as it's shared
 # by all sessions.
 session = SparkSession(sc, options=self._options)
-for key, value in self._options.items():
-
session._jsparkSession.sessionState().conf().setConfString(key, value)
+else:

Review comment:
   Hm, in Scala side, `applyModifiableSettings` is always invoked when the 
default Spark session is not found, and `SparkContext` is already running. Is 
this matched with Scala side? Seems not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r759991137



##
File path: python/pyspark/sql/session.py
##
@@ -304,8 +329,13 @@ def __init__(
 and not 
self._jvm.SparkSession.getDefaultSession().get().sparkContext().isStopped()
 ):
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
+if options is not None:
+self.applyModifiableSetting(jsparkSession, options)
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
+else:
+if options is not None:
+self.applyModifiableSetting(jsparkSession, options)

Review comment:
   Hm, in this case, there's an existing `jsparkSession`.  Do we still need 
to check the configurations?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox


HyukjinKwon commented on a change in pull request #34757:
URL: https://github.com/apache/spark/pull/34757#discussion_r759989446



##
File path: python/pyspark/sql/session.py
##
@@ -287,6 +289,29 @@ def getOrCreate(self) -> "SparkSession":
 _instantiatedSession: ClassVar[Optional["SparkSession"]] = None
 _activeSession: ClassVar[Optional["SparkSession"]] = None
 
+def applyModifiableSetting(self, session: JavaObject, options: Dict[str, 
Any]) -> None:

Review comment:
   Can we call JVM's `SparkSession.applyModifiableSetting` after making it 
`private[sql]`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org