[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-11 Thread xu chenhui
Igor Fedotov wrote: > Hi chenhui, > > there is still a work in progress to support multiple labels to avoid > the issue (https://github.com/ceph/ceph/pull/55374). But this is of > little help for your current case. > > If your disk is fine (meaning it's able to read/write block at offset 0) >

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-05 Thread Igor Fedotov
On 05/04/2024 17:28, xu chenhui wrote: Hi, Igor Thank you for providing the repair procedure. I will try it when I am back to my workstation. Can you provide any possible reasons for this problem? Unfortunately no. I recall a few cases like that but I doubt any one knows the root cause. ceph

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-05 Thread xu chenhui
Hi, Igor Thank you for providing the repair procedure. I will try it when I am back to my workstation. Can you provide any possible reasons for this problem? ceph version: v16.2.5 error info: systemd[1]: Started Ceph osd.307 for 02eac9e0-d147-11ee-95de-f0b2b90ee048. bash[39068]: Running

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-05 Thread Igor Fedotov
Hi chenhui, there is still a work in progress to support multiple labels to avoid the issue (https://github.com/ceph/ceph/pull/55374). But this is of little help for your current case. If your disk is fine (meaning it's able to read/write block at offset 0) you might want to try to recover

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-04 Thread xu chenhui
Hi, Has there been any progress on this issue ? is there quick recover method? I have same problem with you that first 4k block of osd metadata is invalid. It will pay a heavy price to recreate osd. Thanks. ___ ceph-users mailing list --

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-30 Thread Alison Peisker
Hi, It looks like Igor is right, it does appear to be a corruption. ls /var/lib/ceph/252fcf9a-b169-11ed-87be-3cecef623f33/osd.665/ ceph_fsid config fsid keyring ready require_osd_release type unit.configured unit.created unit.image unit.meta unit.poststop unit.run unit.stop whoami head -c 4096

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-29 Thread Igor Fedotov
Hi All, from the log output (the line with "Malformed input" string) it rather looks like a device label (the very first 4K data block at main OSD device containing  some basic OSD meta, e.g. OSD UUID) corruption. There are some chances that wrong device has been attached too. Alison, to

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-29 Thread Frank Schilder
____________ From: apeis...@fnal.gov Sent: Friday, August 25, 2023 10:29 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: A couple OSDs not starting after host reboot Hi, Thank you for your reply. I don’t think the device names changed, but ceph seems to be confused about which

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-28 Thread apeisker
Hi, Thank you for your reply. I don’t think the device names changed, but ceph seems to be confused about which device the OSD is on. It’s reporting that there are 2 OSDs on the same device although this is not true. ceph device ls-by-host | grep sdu ATA_HGST_HUH728080ALN600_VJH4GLUX sdu

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-25 Thread Eugen Block
Hi, one thing coming to mind is maybe the device names have changed from /dev/sdX to /dev/sdY? Something like that has been reported a couple of times in the last months. Zitat von Alison Peisker : Hi all, We rebooted all the nodes in our 17.2.5 cluster after performing kernel updates,