This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new bbaeb35d9bd [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Check 
SPARK_CONNECT_MODE_ENABLED when creating a session
bbaeb35d9bd is described below

commit bbaeb35d9bdba045a25b19f67618beb5b25f54d4
Author: Hyukjin Kwon <gurwls...@apache.org>
AuthorDate: Wed Aug 16 19:48:30 2023 +0200

    [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Check SPARK_CONNECT_MODE_ENABLED 
when creating a session
    
    ### What changes were proposed in this pull request?
    
    This PR is a followup of https://github.com/apache/spark/pull/41013 that 
adds a check for `SPARK_CONNECT_MODE_ENABLED` when we create a Spark session 
via `pyspark.sql.SparkSession.getOrCreate()` so the next session can be 
consistently returned.
    
    ### Why are the changes needed?
    
    Currently, it returns non Spark connect session if you call 
`SparkSession.builder.getOrCreate` after creating a Spark Connect session if 
`SPARK_REMOTE` is not globally set (that is currently automatically set if you 
launch `./bin/pyspark --remote local`). So this can only be reproducible when 
you Spark Connect as a library. See how it's tested below.
    
    ### Does this PR introduce _any_ user-facing change?
    
    The change has not been released yet so there's no behaviour changes to the 
end users.
    
    ### How was this patch tested?
    
    Manually tested as below:
    
    ```bash
    cd python
    python
    ```
    
    ```python
    from pyspark.sql import SparkSession
    SparkSession.builder.remote("local").getOrCreate()
    SparkSession.builder.getOrCreate()
    ```
    
    ```
    <pyspark.sql.connect.session.SparkSession object at 0x7fa9807fd790>
    <pyspark.sql.session.SparkSession object at 0x7fd97890e730>
    ```
    
    ```
    <pyspark.sql.connect.session.SparkSession object at 0x7fa9807fd790>
    <pyspark.sql.connect.session.SparkSession object at 0x7fa9807fd790>
    ```
    
    Closes #42464 from HyukjinKwon/SPARK-43509.
    
    Authored-by: Hyukjin Kwon <gurwls...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
    (cherry picked from commit 11cbdc291c96926820419b97559f25955d7791d6)
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/session.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 9141051fdf8..ce197319977 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -455,7 +455,11 @@ class SparkSession(SparkConversionMixin):
             opts = dict(self._options)
 
             with self._lock:
-                if "SPARK_REMOTE" in os.environ or "spark.remote" in opts:
+                if (
+                    "SPARK_CONNECT_MODE_ENABLED" in os.environ
+                    or "SPARK_REMOTE" in os.environ
+                    or "spark.remote" in opts
+                ):
                     with SparkContext._lock:
                         from pyspark.sql.connect.session import SparkSession 
as RemoteSparkSession
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to