peter-toth opened a new pull request #31818:
URL: https://github.com/apache/spark/pull/31818


   ### What changes were proposed in this pull request?
   
   One of our customers frequently encounters `"serve-DataFrame" 
java.net.SocketTimeoutException: Accept timed` errors in PySpark because 
`DataSet.collectToPython()` in Spark 2.4 does the following:
   1. Collects the results
   2. Opens up a socket server that is then listening to the connection from 
Python side
   3. Runs the event listeners as part of `withAction` on the same thread as 
SPARK-25680 is not available in Spark 2.4
   4. Returns the address of the socket server to Python
   5. The Python side connects to the socket server and fetches the data
   
   As the customer has a custom, long running event listener the time between 
2. and 5. is frequently longer than the default connection timeout and 
increasing the connect timeout is not a good solution as we don't know how long 
running the listeners can take.
   
   ### Why are the changes needed?
   
   This PR simply moves the socket server creation (2.) after running the 
listeners (3.). I think this approach has has a minor side effect that errors 
in socket server creation are not reported as `onFailure` events, but currently 
errors happening during opening the connection from Python side or data 
transfer from JVM to Python are also not reported as events so IMO this is not 
a big change.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Manually.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to