grundprinzip opened a new pull request, #43664:
URL: https://github.com/apache/spark/pull/43664

   ### What changes were proposed in this pull request?
   Without this patch, when the server would restart because of an abnormal 
condition, the client would not realize that this be the case. For example, 
when a driver OOM occurs and the driver is restarted, the client would not 
realize that the server is restarted and a new session is assigned.
   
   This patch fixes this behavior and asserts that the server side session ID 
does not change during the connection. If it does change it throws an exception 
like this:
   
   ```
   >>> spark.range(10).collect()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py",
 line 1710, in collect
       table, schema = self._to_table()
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/dataframe.py",
 line 1722, in _to_table
       table, schema = self._session.client.to_table(query, 
self._plan.observations)
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 839, in to_table
       table, schema, _, _, _ = self._execute_and_fetch(req, observations)
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1295, in _execute_and_fetch
       for response in self._execute_and_fetch_as_iterator(req, observations):
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1273, in _execute_and_fetch_as_iterator
       self._handle_error(error)
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1521, in _handle_error
       raise error
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1266, in _execute_and_fetch_as_iterator
       yield from handle_response(b)
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1193, in handle_response
       self._verify_response_integrity(b)
     File 
"/Users/martin.grund/Development/spark/python/pyspark/sql/connect/client/core.py",
 line 1622, in _verify_response_integrity
       raise SparkConnectException(
   pyspark.errors.exceptions.connect.SparkConnectException: Received incorrect 
server side session identifier for request. Please restart Spark Session. 
(9493c83d-cfa4-488f-9522-838ef3df90bf != c5302e8f-170d-477e-908d-299927b68fd8)
   ```
   
   ### Why are the changes needed?
   Stability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   - Existing tests cover the basic invariant.
   - Added new tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to