[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
If any rook-ceph users see the situation that mds is stuck in replay, then look at the logs of the mds pod. When it runs and then terminates repeatedly, check if there is "liveness probe termninated" error message by typing "kubectl describe pod -n (namspace) (mds' pod name)" If there is the

[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
This issue has been closed. If any rook-ceph users see this, when mds replay takes a long time, look at the logs in mds pod. If it's going well and then abruptly terminates, try describing the mds pod, and if liveness probe terminated, try increasing the threadhold of liveness probe.

[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
I think the rook-ceph is not responding to the liveness probe (confirmed by k8s describe mds pod) I don't think it's the memory as I don't limit it, and I have the cpu set to 500m per mds, but what direction should I go from here? ___ ceph-users

[ceph-users] mds terminated

2023-07-18 Thread dxodnd
hello. I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in standby. I have one ceph filesystem, and 2 mds are trimming. Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in ACTIVE. For some reason, since 36 hours ago, RESOLVE is stuck in