[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread David Orman
Hi Peter, We fixed this bug: https://tracker.ceph.com/issues/47738 recently here: https://github.com/ceph/ceph/commit/b4316d257e928b3789b818054927c2e98bb3c0d6 which should hopefully be in the next release(s). David On Thu, Jun 17, 2021 at 12:13 PM Peter Childs wrote: > > Found the issue in

[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread Andrew Walker-Brown
Changing pg_num and pgp_num manually can be a useful tool. Just remember that they need to be factor of 2, don’t increase or decease more than a couple of steps e.g. 64 to 128 or 256….but not to 1024 etc. I had a situation where a couple of OSDs got quite full. I added more capacity but the

[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread Peter Childs
Found the issue in the end I'd managed to kill the autoscaling features by playing with pgp_num and pg_num and it was getting confusing. I fixed it in the end by reducing pg_num on some of my test pools and the manager woke up and started working again. It was not clear as to what I'd done to

[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread Eugen Block
Hi, don't give up on Ceph. ;-) Did you try any of the steps from the troubleshooting section [1] to gather some events and logs? Could you share them, and maybe also some more details about that cluster? Did you enable any non-default mgr modules? There have been a couple reports related