grundprinzip opened a new pull request, #43664:
URL: https://github.com/apache/spark/pull/43664
### What changes were proposed in this pull request?
Without this patch, when the server would restart because of an abnormal
condition, the client would not realize that this be the case. For example,
when a driver OOM occurs and the driver is restarted, the client would not
realize that the server is restarted and a new session is assigned.
This patch fixes this behavior and asserts that the server side session ID
does not change during the connection. If it does change it throws an exception
like this:
```
>>> spark.range(10).collect()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py",
line 1710, in collect
table, schema = self._to_table()
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py",
line 1722, in _to_table
table, schema = self._session.client.to_table(query,
self._plan.observations)
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 839, in to_table
table, schema, _, _, _ = self._execute_and_fetch(req, observations)
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1295, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req, observations):
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1273, in _execute_and_fetch_as_iterator
self._handle_error(error)
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1521, in _handle_error
raise error
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1266, in _execute_and_fetch_as_iterator
yield from handle_response(b)
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1193, in handle_response
self._verify_response_integrity(b)
File
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
line 1622, in _verify_response_integrity
raise SparkConnectException(
pyspark.errors.exceptions.connect.SparkConnectException: Received incorrect
server side session identifier for request. Please restart Spark Session.
(9493c83d-cfa4-488f-9522-838ef3df90bf != c5302e8f-170d-477e-908d-299927b68fd8)
```
### Why are the changes needed?
Stability
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Existing tests cover the basic invariant.
- Added new tests.
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]