BryanCutler opened a new pull request #24834: [WIP][SPARK-27992][PYTHON] 
Synchronize with Python connection thread to propagate errors
URL: https://github.com/apache/spark/pull/24834
 
 
   ## What changes were proposed in this pull request?
   
   Currently with `toLocalIterator()` and `toPandas()` with Arrow enabled, if 
the Spark job being run in the background serving thread errors, it will be 
caught and sent to Python through the PySpark serializer. 
   This is not the ideal solution because it is only catch a SparkException, it 
won't handle an error that occurs in the serializer, and each method has to 
have it's own special handling to propagate the error.
   
   This PR instead returns the Python Server object along with the serving port 
and authentication info, so that it allows the Python caller to synchronize 
with the serving thread. During the call to synchronize, the serving thread 
Future is completed either successfully or with an exception. In the latter 
case, the exception will be propagated to Python through the Py4j call.
   
   ## How was this patch tested?
   
   Existing tests

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to