see https://tracker.ceph.com/issues/38724 .  "this results in the
Production VMs becoming unresponsive as their disks are unavailable when we
have multiple OSDs down on multiple hosts. (we are doing 2 copy) I've seen
it where 3 OSDs are down at the same time on different hosts due to this
bug. That's when we are seemingly really un lucky with the BUG. (3 copy
would not have saved us from that)"

"

That OSD failure seems to have caused a cascade. Several more OSDs have
crashed. 12% of objects were degraded, and I had to create new 'ssd' class
OSDs to get enough failure domains. I cancelled the cp to prioritize
recovery. Is there any workaround to repair the OSDs and get them to
restart properly? They just crash again every time I restart them.

Can this bug please be set to a higher priority? This has caused an outage
for myself and Edward above, and threatens data loss. That warrants at
least Major."


And we had  our most important virtual machines [ freepbx phone, postfix
mail, dovecot imap , order entry data, accounting  etc ] go off line .  we
have a great backup system and were able to restore all except  for the
last 40 minutes of data.




and check this thread:
https://www.mail-archive.com/ceph-users@ceph.io/msg00488.html

-1  ,  as ceph versions greater the 12.2.11 are unstable.

Reply via email to