[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-28 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r758008380



##
File path: python/pyspark/sql/session.py
##
@@ -301,6 +306,9 @@ def __init__(self, sparkContext: SparkContext, 
jsparkSession: Optional[JavaObjec
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())

Review comment:
   Yup, let's create a followup PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-28 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r758005218



##
File path: python/pyspark/sql/session.py
##
@@ -301,6 +306,9 @@ def __init__(self, sparkContext: SparkContext, 
jsparkSession: Optional[JavaObjec
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())

Review comment:
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-28 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r758001741



##
File path: python/pyspark/sql/session.py
##
@@ -301,6 +306,9 @@ def __init__(self, sparkContext: SparkContext, 
jsparkSession: Optional[JavaObjec
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())

Review comment:
   LGTM with a couple of nits: @AngersZh,
   
   - Can we actually leverage existing constructor on `SparkSession` to pass 
the initial options instead of setting it manually? Here unlike Scala, it 
initiates `sharedState` always. I think it's best to keep the code path matched.
   - Another nit is that: It's always preferred to use less Py4J connections 
which exposes potential flakiness. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-28 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r758001741



##
File path: python/pyspark/sql/session.py
##
@@ -301,6 +306,9 @@ def __init__(self, sparkContext: SparkContext, 
jsparkSession: Optional[JavaObjec
 jsparkSession = 
self._jvm.SparkSession.getDefaultSession().get()
 else:
 jsparkSession = self._jvm.SparkSession(self._jsc.sc())

Review comment:
   LGTM with a couple of nits: @AngersZh,
   
   - Can we actually leverage existing constructor on `SparkSession` to pass 
the initial options instead of manually? Here unlike Scala, it initiates 
`sharedState` always. I think it's best to keep the code path matched.
   - Another nit is that: It's always preferred to use less Py4J connections 
which exposes potential flakiness. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-13 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r748779639



##
File path: python/pyspark/sql/session.py
##
@@ -287,7 +287,12 @@ def getOrCreate(self) -> "SparkSession":
 _instantiatedSession: ClassVar[Optional["SparkSession"]] = None
 _activeSession: ClassVar[Optional["SparkSession"]] = None
 
-def __init__(self, sparkContext: SparkContext, jsparkSession: 
Optional[JavaObject] = None):
+def __init__(
+self,
+sparkContext: SparkContext,
+jsparkSession: Optional[JavaObject] = None,
+options: Optional[Dict[str, Any]] = None,
+):

Review comment:
   Yeah I don't think this is breaking. Let me double check closely by tmr 
EOD but from a cursory look it seems fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-13 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r748779639



##
File path: python/pyspark/sql/session.py
##
@@ -287,7 +287,12 @@ def getOrCreate(self) -> "SparkSession":
 _instantiatedSession: ClassVar[Optional["SparkSession"]] = None
 _activeSession: ClassVar[Optional["SparkSession"]] = None
 
-def __init__(self, sparkContext: SparkContext, jsparkSession: 
Optional[JavaObject] = None):
+def __init__(
+self,
+sparkContext: SparkContext,
+jsparkSession: Optional[JavaObject] = None,
+options: Optional[Dict[str, Any]] = None,
+):

Review comment:
   Yeah I don't think this is breaking. Let me double check closely by tmr 
EOD but from a coursory look it seems fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #34559: [SPARK-37291][PYTHON][SQL] PySpark init SparkSession should copy conf to sharedState

2021-11-13 Thread GitBox


HyukjinKwon commented on a change in pull request #34559:
URL: https://github.com/apache/spark/pull/34559#discussion_r748779639



##
File path: python/pyspark/sql/session.py
##
@@ -287,7 +287,12 @@ def getOrCreate(self) -> "SparkSession":
 _instantiatedSession: ClassVar[Optional["SparkSession"]] = None
 _activeSession: ClassVar[Optional["SparkSession"]] = None
 
-def __init__(self, sparkContext: SparkContext, jsparkSession: 
Optional[JavaObject] = None):
+def __init__(
+self,
+sparkContext: SparkContext,
+jsparkSession: Optional[JavaObject] = None,
+options: Optional[Dict[str, Any]] = None,
+):

Review comment:
   Yeah I don't think this is breaking. Let me double check closely by tmr 
EODbut from a coursory look it seems fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org