If any rook-ceph users see the situation that mds is stuck in replay, then look
at the logs of the mds pod.
When it runs and then terminates repeatedly, check if there is "liveness probe
termninated" error message by typing "kubectl describe pod -n (namspace) (mds'
pod name)"
If there is the
This issue has been closed.
If any rook-ceph users see this, when mds replay takes a long time, look at the
logs in mds pod.
If it's going well and then abruptly terminates, try describing the mds pod,
and if liveness probe terminated, try increasing the threadhold of liveness
probe.
I think the rook-ceph is not responding to the liveness probe (confirmed by k8s
describe mds pod) I don't think it's the memory as I don't limit it, and I have
the cpu set to 500m per mds, but what direction should I go from here?
___
ceph-users
hello.
I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in
standby.
I have one ceph filesystem, and 2 mds are trimming.
Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in
ACTIVE.
For some reason, since 36 hours ago, RESOLVE is stuck in