&& systemctl start
ceph-osd@$i.service; done
307 ls -al /dev/dm-*
308 ceph osd tree | grep down
> -Original Message-
> From: c...@elchaka.de
> Sent: Wednesday, April 8, 2020 5:43 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Multiple OSDs down, and
Hello,
I am just answearing to let you know that there are people around you and
Seeing your messages.
Unfortunality i can not help much, but i did have a Strange issue like you
where few osds went Down or stay up but i couldnt reach the node via ssh.
The case: we have done updates on our
So, the recovery stalled a few more OSDs in, but looking at the disks with OSDs
marked down, I noticed that, despite systemctl reporting that the OSD processes
were all *up*, several of them had not written to their logs since they rotated.
Suspecting that these OSDs were stalled, I've started
Update a day later:
the cluster is *very slowly* recovering, it looks like: we're now at 113 OSDs
down (improved from 140 OSDs down when everything broke) - but it took a day
before anything changed here, and it looks like we're recovering at a rate of
about 1 -2 OSDs per hour...
So I'm not
(I note that some of the down OSDs still report issues with secret
dissemination:
2020-04-01 14:32:11.265 7f9d9a7be700 0 auth: could not find secret_id=5010
2020-04-01 14:32:11.265 7f9d9a7be700 0 cephx: verify_authorizer could not get
service secret for service osd secret_id=5010
2020-04-01