[PR] [SPARK-18523][PySpark] `SparkContext.stop` should catch `ConnectionRefusedError` as well [spark]

via GitHub Mon, 24 Jun 2024 06:27:53 -0700


ravwojdyla opened a new pull request, #47069:
URL: https://github.com/apache/spark/pull/47069


   This is a follow up to https://github.com/apache/spark/pull/15961. I'm 
currently on:
   
   ```
   pyspark                   3.5.0              pyhd8ed1ab_0    conda-forge
   py4j                      0.10.9.7           pyhd8ed1ab_0    conda-forge
   ```
   
   When Spark JVM process gets OOM-Killed, `SparkContext.stop` fails with 
`ConnectionRefusedError`, which leaves the `SparkSession/Context` in a "dirty" 
state.
   
   ```
   Traceback (most recent call last):
     ...
     File "<TRUNC>/lib/python3.11/site-packages/pyspark/sql/session.py", line 
1796, in stop
       self._sc.stop()
     File "<TRUNC>/lib/python3.11/site-packages/pyspark/context.py", line 654, 
in stop
       self._jsc.stop()
     File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 
1321, in __call__
       answer = self.gateway_client.send_command(command)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 
1036, in send_command
       connection = self._get_connection()
                    ^^^^^^^^^^^^^^^^^^^^^^
     File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 
284, in _get_connection
       connection = self._create_new_connection()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 
291, in _create_new_connection
       connection.connect_to_java_server()
     File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 
438, in connect_to_java_server
       self.socket.connect((self.java_address, self.java_port))
   ConnectionRefusedError: [Errno 111] Connection refused
   ```
   
   ### What changes were proposed in this pull request?
   
   In `SparkContext.stop` catch both `Py4JError` and `ConnectionRefusedError` 
from `self._jsc.stop()`.
   
   ### Why are the changes needed?
   
   Otherwise it's not possible to close/clean up SparkSession/Context
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Before this PR, the user would get the stacktrace you can see above, 
after this change, the `SparkContext.stop()` would not fail when the Spark JVM 
processed gets OOM-Killed (or killed in some other way).
   
   ### How was this patch tested?
   
   1. start a `SparkSession`
   2. kill -9 the JVM process
   3. `SparkSession.stop()` with and without this patch
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-18523][PySpark] `SparkContext.stop` should catch `ConnectionRefusedError` as well [spark]

Reply via email to