cdkrot opened a new pull request, #42538:
URL: https://github.com/apache/spark/pull/42538
### What changes were proposed in this pull request?
Implement a heartbeat for spark connect. This works by maintaining the
number of open queries and sending "keep-alive" requests from a separate thread
if at least one such query is running
### Why are the changes needed?
We notice that clients who wish to execute long requests (specifically this
happens to ExecutePlanRequests taking 1 hour or more), often face disconnects
by intermediate proxy layers, such as those in common cloud providers.
Apparently more standard ways to resolve this such as grpc's heartbeat don't
help.
### Does this PR introduce _any_ user-facing change?
This change proposes to enable heartbeat by default
### How was this patch tested?
UT and E2E coverage. It was also separately verified that 1 hour 10 min
query worked fine, i.e.
```
test("Alice") {
val n: Long = 80e13.toLong
val res = spark.range(n).count()
assert(res == n)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]