[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-07 Thread Sang, Oliver
&& systemctl start ceph-osd@$i.service; done 307 ls -al /dev/dm-* 308 ceph osd tree | grep down > -Original Message- > From: c...@elchaka.de > Sent: Wednesday, April 8, 2020 5:43 AM > To: ceph-users@ceph.io > Subject: [ceph-users] Re: Multiple OSDs down, and

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-07 Thread ceph
Hello, I am just answearing to let you know that there are people around you and Seeing your messages. Unfortunality i can not help much, but i did have a Strange issue like you where few osds went Down or stay up but i couldnt reach the node via ssh. The case: we have done updates on our

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-02 Thread aoanla
So, the recovery stalled a few more OSDs in, but looking at the disks with OSDs marked down, I noticed that, despite systemctl reporting that the OSD processes were all *up*, several of them had not written to their logs since they rotated. Suspecting that these OSDs were stalled, I've started

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-02 Thread aoanla
Update a day later: the cluster is *very slowly* recovering, it looks like: we're now at 113 OSDs down (improved from 140 OSDs down when everything broke) - but it took a day before anything changed here, and it looks like we're recovering at a rate of about 1 -2 OSDs per hour... So I'm not

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-01 Thread aoanla
(I note that some of the down OSDs still report issues with secret dissemination: 2020-04-01 14:32:11.265 7f9d9a7be700 0 auth: could not find secret_id=5010 2020-04-01 14:32:11.265 7f9d9a7be700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5010 2020-04-01