Hello, I have a cluster with 1 control node (4 CPU, 4 GB RAM, 8 GB disk) and 1 worker node (24 vCPU, 48 GB RAM, 500 GB disk).
Whenever I stop this cluster, it takes around 30 minutes to shut down, and similarly 30 minutes to start back up. In comparison, my other clusters — which have multiple worker nodes (4 CPU, 8 GB RAM, 50 GB disk) — start and stop in under 15 minutes. I can see that the instances themselves shut down, but the cluster status remains in “Stopping” for a long time. Similarly, when starting the cluster, the instances come up quickly, but the cluster status stays in “Starting” even after the instances are already running. I tried deleting and recreating this cluster with the same specifications, but the behavior remains the same. I also tried placing both the control node and worker node on the same host machine, but there was no improvement. Could someone please suggest how I can further investigate and identify the root cause of this delay? -- With Regards, Nixon Varghese
