ravwojdyla opened a new pull request, #47069: URL: https://github.com/apache/spark/pull/47069
This is a follow up to https://github.com/apache/spark/pull/15961. I'm currently on: ``` pyspark 3.5.0 pyhd8ed1ab_0 conda-forge py4j 0.10.9.7 pyhd8ed1ab_0 conda-forge ``` When Spark JVM process gets OOM-Killed, `SparkContext.stop` fails with `ConnectionRefusedError`, which leaves the `SparkSession/Context` in a "dirty" state. ``` Traceback (most recent call last): ... File "<TRUNC>/lib/python3.11/site-packages/pyspark/sql/session.py", line 1796, in stop self._sc.stop() File "<TRUNC>/lib/python3.11/site-packages/pyspark/context.py", line 654, in stop self._jsc.stop() File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 1321, in __call__ answer = self.gateway_client.send_command(command) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<TRUNC>/lib/python3.11/site-packages/py4j/java_gateway.py", line 1036, in send_command connection = self._get_connection() ^^^^^^^^^^^^^^^^^^^^^^ File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 284, in _get_connection connection = self._create_new_connection() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 291, in _create_new_connection connection.connect_to_java_server() File "<TRUNC>/lib/python3.11/site-packages/py4j/clientserver.py", line 438, in connect_to_java_server self.socket.connect((self.java_address, self.java_port)) ConnectionRefusedError: [Errno 111] Connection refused ``` ### What changes were proposed in this pull request? In `SparkContext.stop` catch both `Py4JError` and `ConnectionRefusedError` from `self._jsc.stop()`. ### Why are the changes needed? Otherwise it's not possible to close/clean up SparkSession/Context ### Does this PR introduce _any_ user-facing change? Yes. Before this PR, the user would get the stacktrace you can see above, after this change, the `SparkContext.stop()` would not fail when the Spark JVM processed gets OOM-Killed (or killed in some other way). ### How was this patch tested? 1. start a `SparkSession` 2. kill -9 the JVM process 3. `SparkSession.stop()` with and without this patch ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
