[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
Sorry - the info about the insufficient space seems like it referred to why the devices are not available. So that's just as is should be. All MDS are still in error state and were refreshed 2d ago. Even right after a mgr failover. So it seems, there's something else going on. One thing that

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
Thanks for your detailed explanations! That helped a lot. All MDS are still in status error. "ceph orch device ls" showed that some hosts seem to not have enough space on devices. I wonder why I didn't see that in monitoring. Anyway, I'll fix that and then try to proceed. When the backport

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li
On 4/11/23 15:59, Thomas Widhalm wrote: On 11.04.23 09:16, Xiubo Li wrote: On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Thomas Widhalm
On 11.04.23 09:16, Xiubo Li wrote: On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this fix is not in the v17.2.6 yet in upstream

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li
On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this fix is not in the v17.2.6 yet in upstream code. Thanks - Xiubo But now I'm

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
It seems like it maybe didn't actually do the redeploy as it should log something saying it's actually doing it on top of the line saying it scheduled it. To confirm, the upgrade is paused ("ceph orch upgrade status" reports is_paused as false)? If so, maybe try doing a mgr failover ("ceph mgr

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Thomas Widhalm
I did what you told me. I also see in the log, that the command went through: 2023-04-10T19:58:46.522477+ mgr.ceph04.qaexpv [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks 2023-04-10T20:01:03.360559+ mgr.ceph04.qaexpv [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
Will also note that the normal upgrade process scales down the mds service to have only 1 mds per fs before upgrading it, so maybe something you'd want to do as well if the upgrade didn't do it already. It does so by setting the max_mds to 1 for the fs. On Mon, Apr 10, 2023 at 3:51 PM Adam King

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Adam King
You could try pausing the upgrade and manually "upgrading" the mds daemons by redeploying them on the new image. Something like "ceph orch daemon redeploy --image <17.2.6 image>" (daemon names should match those in "ceph orch ps" output). If you do that for all of them and then get them into an