GitHub user rishabhjain1997 created a discussion: CKS cluster setup hangs in
guest waiting for /mnt/k8sdisk binaries (CloudStack 4.22 + RHEL 9.7 KVM +
Ubuntu CKS template)
# Body
## Summary
CKS cluster provisioning consistently fails on our environment. The control VM
boots, cloud-init runs to completion with `DataSourceCloudStack`, but the in-VM
setup script then spends ~25 minutes polling for `/mnt/k8sdisk/` to be
populated by the Kubernetes binaries ISO, gives up, and `kubelet.service` never
gets installed. CKS then reports `Failed to setup Kubernetes cluster : <name>
is not in usable state as the system is unable to access control node VMs of
the cluster`, which is a downstream symptom — the real failure is that the
binaries ISO contents never appear inside the guest.
We've reproduced this with both a custom-patched Ubuntu cloud-image and the
**canonical ShapeBlue/Apache CKS template** (`cks-ubuntu-2204-kvm.qcow2.bz2`,
md5 `627c49a5523fc2cfddebd0a1a396512f`), on two separate KVM hosts. Same
failure both times, which rules out template or per-host config.
## Environment
| Component | Version |
|---|---|
| CloudStack | 4.22.0.0 (mgmt + agent) |
| KVM host OS | RHEL 9.7 (kernel 5.14.0-611.5.1.el9_7.x86_64) |
| qemu-kvm | 9.1.0-29.el9_7.6 |
| libvirt | 10.10.0-15.8.el9_7 |
| Zone type | Advanced (isolated tenant networks behind VR) |
| Primary storage | local |
| Secondary storage | NFS |
| CKS Kubernetes version | v1.33.1 (binaries ISO `setup-v1.33.1-fresh.iso`) |
| Template tested | `cks-ubuntu-2204-kvm.qcow2.bz2` from
`download.cloudstack.org/testing/custom_templates/ubuntu/22.04/` |
## Symptoms
### Inside the guest (control VM, via VNC console)
```
[1454] cloud-init[874]: Waiting for Binaries directory /mnt/k8sdisk/ to be
available, sleeping for 15 seconds, attempt: 97
[1469] cloud-init[874]: Waiting for Binaries directory /mnt/k8sdisk/ to be
available, sleeping for 15 seconds, attempt: 98
[1484] cloud-init[874]: Waiting for Binaries directory /mnt/k8sdisk/ to be
available, sleeping for 15 seconds, attempt: 99
[1499] cloud-init[874]: Waiting for Binaries directory /mnt/k8sdisk/ to be
available, sleeping for 15 seconds, attempt: 100
[1514] cloud-init[874]: Warning: Offline install timed out!
[1514] cloud-init[874]: Failed to enable unit: Unit file kubelet.service does
not exist.
...
Cloud-init v. 24.4-0ubuntu1~22.04.1 finished at Mon, 11 May 2026 20:56:13
+0000. Datasource DataSourceCloudStack. Up 1514.89 seconds
```
So: cloud-init's `DataSourceCloudStack` works (user-data + SSH keys + hostname
all applied), but the binaries ISO contents never show up at `/mnt/k8sdisk/`
and the setup loop times out.
### Management server log
```
ERROR [c.c.k.c.a.KubernetesClusterStartWorker]
Failed to setup Kubernetes cluster : test-cluster-32 is not in usable state
as the system is unable to access control node VMs of the cluster
```
### KVM agent log — patch.sh
We separately see `patch.sh` timing out every 5 minutes via qemu-guest-agent:
```
WARN [kvm.resource.LibvirtComputingResource]
Process [...] for command [.../patch.sh -n i-2-171-VM -c template=domP
name=test-cluster-32-control-...
type=cksnode eth0ip=192.168.100.186 ...] timed out. Output is [].
ERROR [kvm.resource.LibvirtComputingResource] Passing cmdline failed:timeout
```
And `virsh qemu-agent-command i-2-171-VM '{"execute":"guest-ping"}'` reports:
```
error: Guest agent is not responding: QEMU guest agent is not connected
```
Qemu monitor confirms the host side socket is open but disconnected:
```
charchannel0:
filename=disconnected:unix:/var/lib/libvirt/qemu/i-2-171-VM.org.qemu.guest_agent.0,server=on
```
The qga issue may be a contributing factor (since it's how `patch.sh` injects
the cloud-init cmdline) — but cloud-init clearly still ran via the
metadata-server path, so the user-data delivery itself isn't the blocker. The
cluster failure is happening in the ISO-mount step that runs *after* cloud-init.
## What we've tried
1. Custom-patched Ubuntu 22.04 cloud-image (installed `qemu-guest-agent`,
`containerd`, deps via `virt-customize --install`; added a manual
`multi-user.target.wants/qemu-guest-agent.service` symlink; experimented with
clearing `BindsTo=` via a systemd drop-in). All variants hit the same failure.
2. Downloaded the **canonical** CKS template fresh from
`download.cloudstack.org` and registered as a new template. Same failure.
3. Disabled two of the three KVM hosts to pin VMs to the original working host.
Same failure.
4. Verified `/var/www/html/setup-v1.33.1-fresh.iso` exists on secondary
storage, registered as ISO template id 227, NFS-mounted at
`nfs://10.96.32.32/mnt/raid0/secondary`.
We've validated that:
- The chardev for qga on the host is open (`server=on`) and just lacks a
guest-side connection.
- No SELinux AVC denials on the host.
- Both KVM machine types used are `pc-i440fx-rhel7.6.0` (qemu reports this is
deprecated).
- The binaries ISO is present and was attached then detached around the time
CKS gave up (per `KubernetesClusterStartWorker.Detached Kubernetes binaries
from VM`).
## Questions for the community
1. **For CloudStack 4.22 + RHEL 9.7 KVM hosts**, is there a known issue with
the Kubernetes binaries ISO not appearing at `/mnt/k8sdisk/` inside Ubuntu CKS
guests? Anyone successfully running CKS on this combo?
2. The setup-loop polls `/mnt/k8sdisk/` for 25 min. Where does the canonical
CKS template expect that mount to be sourced from (CD-ROM device path inside
guest)? Should we see a `/dev/sr0` block device, and if not, what would cause
it to be missing despite CKS having attached the ISO at the libvirt layer?
3. Is the deprecated `pc-i440fx-rhel7.6.0` machine type known to break
virtio-serial-based qemu-guest-agent on recent Ubuntu cloud kernels? If so,
what's the recommended override (`hypervisor.kvm.machine.type` global setting)?
4. The qga channel chardev shows `disconnected` even though both the patched
Ubuntu cloud-image AND the canonical CKS template have qga installed. Is there
a known wiring issue for the virtio-serial channel name
`org.qemu.guest_agent.0` on RHEL 9.7 / qemu 9.1?
Happy to provide more logs / mgmt-server traces / `virsh dumpxml` of the
affected VMs on request. Cross-linked from discussion #13056 (original report
from @anirudh-kaushal).
Thanks!
GitHub link: https://github.com/apache/cloudstack/discussions/13147
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]