[ceph-users] CEPH complete cluster failure: unknown PGS

2023-09-28 Thread v1tnam
I have an 8-node cluster with old hardware. a week ago 4 nodes went down and the CEPH cluster went nuts. All pgs became unknown and montors took too long to be in sync. So i reduced the number of mons to one and mgrs to one as well Now the recovery starts with 100% unknown pgs and then pgs start

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Mark Nelson
There are some pretty strange compaction behavior happening in these logs.  For instance, in osd0, we see a O-1 CF L1 compaction that's taking ~204 seconds: 2023-09-21T20:03:59.378+ 7f16a286c700  4 rocksdb: (Original Log Time 2023/09/21-20:03:59.381808) EVENT_LOG_v1 {"time_micros":

[ceph-users] Re: Snap_schedule does not always work.

2023-09-28 Thread Kushagr Gupta
Hi Milind,Team Thank you for the response @Milind. >>Snap-schedule no longer accepts a --subvol argument, Thank you for the information. Currently, we are using the following commands to create the snap-schedules: Syntax: *"ceph fs snap-schedule add /// "* *"ceph fs snap-schedule retention add

[ceph-users] Re: cephfs health warn

2023-09-28 Thread Ben
Hi Venky, and cephers Thanks for reply. no config changes had been made before the issues occurred. It suspects to be client bug. Please see following message about the log segment accumulation to be trimmed.for the moment problematic client nodes can not be rebooted.evicting client will

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Igor Fedotov
Hi Sudhin, It looks like manual DB compactions are (periodically?) issued via admin socket for your OSDs, which (my working hypothesis) triggers DB access stalls. Here are the log lines indicating such calls debug 2023-09-22T11:24:55.234+ 7fc4efa20700  1 osd.1 1192508 triggering manual

[ceph-users] Re: Specify priority for active MGR and MDS

2023-09-28 Thread Janne Johansson
Den ons 27 sep. 2023 kl 15:32 skrev Nicolas FONTAINE : > Hi everyone, > Is there a way to specify which MGR and which MDS should be the active one? > At least for the mgr, you can just fail-over until it lands on the one you want it to be running on. -- May the most significant bit of your