cdkrot opened a new pull request, #42538:
URL: https://github.com/apache/spark/pull/42538

   ### What changes were proposed in this pull request?
   Implement a heartbeat for spark connect. This works by maintaining the 
number of open queries and sending "keep-alive" requests from a separate thread 
if at least one such query is running
   
   ### Why are the changes needed?
   
   We notice that clients who wish to execute long requests (specifically this 
happens to ExecutePlanRequests taking 1 hour or more), often face disconnects 
by intermediate proxy layers, such as those in common cloud providers. 
Apparently more standard ways to resolve this such as grpc's heartbeat don't 
help.
   
   ### Does this PR introduce _any_ user-facing change?
   This change proposes to enable heartbeat by default
   
   ### How was this patch tested?
   
   UT and E2E coverage. It was also separately verified that 1 hour 10 min 
query worked fine, i.e.
   
   ```
   test("Alice") {
       val n: Long = 80e13.toLong
   
       val res = spark.range(n).count()
   
       assert(res == n)
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to