This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 2e382c8  [SPARK-38073][PYTHON] Update atexit function to avoid issues 
with late binding
2e382c8 is described below

commit 2e382c8bff2d0c3733b9b525168254971ca1175e
Author: zero323 <mszymkiew...@gmail.com>
AuthorDate: Fri Feb 4 20:21:02 2022 -0800

    [SPARK-38073][PYTHON] Update atexit function to avoid issues with late 
binding
    
    ### What changes were proposed in this pull request?
    
    This PR updates function registered in PySpark shell `atexit` to capture 
`SparkContext` instead of depending on the surrounding context.
    
    **Note**
    
    A simpler approach
    
    ```python
    atexit.register(sc.stop)
    ```
    
    is possible, but won't work properly in case of contexts with monkey 
patched `stop` methods (for example like 
[pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions))
    
    I also consider using `_active_spark_context`
    
    ```python
    atexit.register(lambda: (
        SparkContext._active_spark_context.stop()
        if SparkContext._active_spark_context
        else None
    ))
    ```
    
    but `SparkContext` is also out of scope, so that doesn't work without 
introducing a standard function within the scope.
    
    ### Why are the changes needed?
    
    When using `ipython` as a driver with Python 3.8, `sc` goes out of scope 
before `atexit` function is called. This leads to `NameError` on exit. This is 
a mild annoyance and likely a bug in ipython (there are quite a few of these 
with similar behavior), but it is easy to address on our side, without causing 
regressions for users of earlier Python versions.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manual testing to confirm that:
    
    - Named error is no longer thrown on exit with ipython and Python 3.8 or 
later.
    - `stop` is indeed invoked on exit with both plain interpreter and ipython 
shells.
    
    Closes #35396 from zero323/SPARK-38073.
    
    Authored-by: zero323 <mszymkiew...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
    (cherry picked from commit 3e0d4899dcb3be226a120cbeec8df78ff7fb00ba)
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 python/pyspark/shell.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/shell.py b/python/pyspark/shell.py
index 25aadb1..0c6a608 100644
--- a/python/pyspark/shell.py
+++ b/python/pyspark/shell.py
@@ -45,7 +45,7 @@ except Exception:
 
 sc = spark.sparkContext
 sql = spark.sql
-atexit.register(lambda: sc.stop())
+atexit.register((lambda sc: lambda: sc.stop())(sc))
 
 # for compatibility
 sqlContext = spark._wrapped

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to