GitHub user tuanhoangth1603 created a discussion: Failed to create RBD storage pool after KVM agent upgrade from 4.20 to 4.22: "org.libvirt.LibvirtException: failed to create the RBD IoCTX"
### problem After upgrading the KVM agent on a compute node from CloudStack 4.20 to 4.22, the agent fails to recreate or connect to the existing RBD storage pool. The error manifests in the agent logs as a LibvirtException during pool initialization, querying if the RBD pool exists (which it does on the Ceph cluster). This prevents the host from fully reconnecting and handling VM operations (e.g., volume attach/detach). The issue appears tied to changes in libvirt (8.0+) or Ceph client libraries post-upgrade, causing IoCTX creation to fail due to temporary secret/cached state mismatches. Notably, a full reboot of the compute node resolves the issue immediately, allowing clean recreation of the pool and secret. However, this introduces unwanted downtime for running VMs on that node, which is unacceptable in production. ### versions Environment CloudStack version: Management server upgraded to 4.22.0 (from 4.20.0) Agent version: KVM agent upgraded from 4.20.0 to 4.22.0 on compute nodes Hypervisor: KVM Primary Storage: Ceph RBD (pool name: cloudstack-zone1; Ceph version: 14) OS on compute nodes: Ubuntu 20.04 ### The steps to reproduce the bug 1. Upgrade mgmt to 4.22 2. upgrade agent to 4.22 3. log error from agent.log: Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to create the RBD IoCTX. Does the pool 'cloudstack-zone1' exist? I also do these commands on CEPH but it's still error ``` # ceph config set mon auth_expose_insecure_global_id_reclaim false # ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false # ceph config set mon auth_allow_insecure_global_id_reclaim false ``` **Expected Behavior** The agent should successfully redefine the RBD storage pool using the existing Ceph configuration (monitors, secrets) without failure, allowing seamless host reconnection post-upgrade. **Actual Behavior** Agent logs show repeated failures to create the RBD IoCTX, followed by cleanup of the libvirt secret. Host status remains "Disconnected" or "Alert" in UI until manual intervention. Full reboot of the compute node resolves the issue immediately (it's so bad solution) GitHub link: https://github.com/apache/cloudstack/discussions/12154 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
