Github user foxish commented on the issue:
https://github.com/apache/spark/pull/20032
Within the allocator's control loop, it's all asynchronous requests being
made for executor pods from the k8s API, so, each loop doesn't take very long.
If a user were to set a very low value for the delay, it wouldn't necessarily
make more requests because the creation is still [rate
limited](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L111)
by the time it takes for the executors launched in the previous round to
become ready (which is around 1s typically, if not higher). So, in the worst
case, we may end up polling the state of our internal data structures quickly
adding to the driver's CPU load, but the data structures are updated async by
watches, so, it doesn't increase the load on the K8s API server or have to
fetch any state over the network. I can add a caveat to the documentation that
setting it to a very low number of ms may increase load on th
e driver.
If the intent with reducing batch internal is for people looking to spin up
N executors as quickly as possible, if the resources are present, I'd recommend
that they use the `batch.size` parameter instead and set that equal to
num_executors. That would realize the same effect, and schedule all executors
in a single loop.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]