Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/20032
  
    Within the allocator's control loop, it's all asynchronous requests being 
made for executor pods from the k8s API, so, each loop doesn't take very long. 
If a user were to set a very low value for the delay, it wouldn't necessarily 
make more requests because the creation is still [rate 
limited](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L111)
 by the time it takes for the executors launched in the previous round to 
become ready (which is around 1s typically, if not higher). So, in the worst 
case, we may end up polling the state of our internal data structures quickly 
adding to the driver's CPU load, but the data structures are updated async by 
watches, so, it doesn't increase the load on the K8s API server or have to 
fetch any state over the network. I can add a caveat to the documentation that 
setting it to a very low number of ms may increase load on th
 e driver.
    
    If the intent with reducing batch internal is for people looking to spin up 
N executors as quickly as possible, if the resources are present, I'd recommend 
that they use the `batch.size` parameter instead and set that equal to 
num_executors. That would realize the same effect, and schedule all executors 
in a single loop.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to