Hi Daniel,

it's perfectly normal for a PG to freeze when the primary osd is not stable.

It can sometimes happen that the disk fails but doesn't immediately send
back I/O errors (which crash the osd).

 When the OSD is stopped, there's a 5-minute delay before it goes down in
the crushmap.

Le ven. 10 nov. 2023 à 11:43, Daniel Schreiber <
daniel.schrei...@hrz.tu-chemnitz.de> a écrit :

> Dear cephers,
> we are sometimes observing stalling IO on our ceph 17.2.6 cluster when
> the backing device for the primary OSD of a PG fails and seems to block
> read IO to objects from that pg. If I set the OSD with the broken device
> to down, the IO continues. Setting the OSD to down is not sufficient.
> The cluster is running on Debian 11, the pool is an erasure coded cephfs
> data pool. The OSD has a HDD data device and an SSD db device. The data
> devices is the one which failed and was blocking IO.
> The OSD was reporting slow ops and short time after that smartd notified
> about unreadable sectors.
> Has anyone seen such behaviour? Are there some tweaks that I missed?
> Kind regards,
> Daniel
> --
> Daniel Schreiber
> Facharbeitsgruppe Systemsoftware
> Universitaetsrechenzentrum
> Technische Universität Chemnitz
> Straße der Nationen 62 (Raum B303)
> 09111 Chemnitz
> Germany
> Tel:     +49 371 531 35444
> Fax:     +49 371 531 835444
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to