--- Begin Message ---
We recently upgraded a 3-node HCI cluster to Proxmox VE 8.0 and Ceph Quincy.
Everything went as expected (thanks to pve7to8 and the great instructions on
the wiki).
During a thorough check of the logs after the upgrade we found the message
Sep 08 12:38:31 pve3 ceph-osd[3462]: 2023-09-08T12:38:31.579+0200 7fd2815b73c0
-1 osd.0 8469 mon_cmd_maybe_osd_create fail: 'osd.0 has already bound to class
'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>'
to remove old class first': (16) Device or resource busy
We have two device classes with 3 osd in the nvme class (one on each node) and
no ssd class.
The nvme crush rule is exactly the same as the original replicated_rule.
root@pve3:~# smartctl -i /dev/nvme1n1 | grep -E '^(Model|Firmware|NVM)'
Model Number: SAMSUNG MZPLJ1T6HBJR-00007
Firmware Version: EPK9AB5Q
NVMe Version: 1.3
root@pve3:~# ceph-volume lvm list /dev/nvme1n1 | grep -E '==|devices|crush'
====== osd.0 =======
crush device class nvme
devices /dev/nvme1n1
root@pve3:~# ceph osd crush class ls
[
"nvme",
"hdd"
]
root@pve3:~# ceph osd crush class ls-osd nvme
0
1
2
root@pve3:~# ceph osd crush rule ls
replicated_rule
nvme
hdd
The mon_cmd_maybe_osd_create failure has been reported for the local nvme osd
after every single node reboot since we installed the cluster back in 2021
(PVE7.0, Ceph Pacific).
Up to now we did not notice and haven’t experienced any negative impact.
Can someone tell us why we are seeing this message (despite not having an ssd
class)?
Can/should we do something about it as suggested in the forum post below?
Ceph trying to reset class
https://forum.proxmox.com/threads/ceph-trying-to-reset-class.101841/
Thanks
Stefan
--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user