[ceph-users] Re: Octopus client for Nautilus OSD/MON
Thank you very much! On Thu, Jun 2, 2022 at 11:23 PM Konstantin Shalygin wrote: > The "next" release is always compatible with "previous one" clusters > > > k > Sent from my iPhone > > > On 2 Jun 2022, at 16:28, Jiatong Shen wrote: > > > > Hello, > > > >where can I find librbd compatility matrix? For example, Is octopus > > client compatible with nautilus server? Thank you. > > > > -- > > > > Best Regards, > > > > Jiatong Shen > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > -- Best Regards, Jiatong Shen ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Slow delete speed through the s3 API
Is it just your deletes which are slow or writes and read as well? On Thu, Jun 2, 2022, 4:09 PM J-P Methot wrote: > I'm following up on this as we upgraded to Pacific 16.2.9 and deletes > are still incredibly slow. The pool rgw is using is a fairly small > erasure coding pool set at 8 + 3. Is there anyone who's having the same > issue? > > On 5/16/22 15:23, J-P Methot wrote: > > Hi, > > > > First of all, a quick google search shows me that questions about the > > s3 API slow object deletion speed have been asked before and are well > > documented. My issue is slightly different, because I am getting > > abysmal speeds of 11 objects/second on a full SSD ceph running Octopus > > with about a hundred OSDs. This is much lower than the Redhat reported > > limit of 1000 objects/second. > > > > I've seen elsewhere that it was a Rocksdb limitation and that it would > > be fixed in Pacific, but the Pacific release logs do not show me > > anything that suggest that. Furthermore, I have limited control over > > the s3client deleting the files as it's a 3rd-party open source > > automatic backup program. > > > > Could updating to Pacific fix this issue? Is there any configuration > > change I could do to speed up object deletion? > > > -- > Jean-Philippe Méthot > Senior Openstack system administrator > Administrateur système Openstack sénior > PlanetHoster inc. > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Help needed picking the right amount of PGs for (Cephfs) metadata pool
On Thu, Jun 2, 2022 at 11:40 AM Stefan Kooman wrote: > > Hi, > > We have a CephFS filesystem holding 70 TiB of data in ~ 300 M files and > ~ 900 M sub directories. We currently have 180 OSDs in this cluster. > > POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED > (DATA) (OMAP) %USED MAX AVAIL > cephfs_metadata 6 512 984 GiB 243 MiB 984 GiB 903.98M 2.9 TiB > 728 MiB 2.9 TiB 3.06 30 TiB > > The PGs in this pool (replicated, size=3, min_size=2), 6, are giving us > a hard time (again). When PGs get remapped to other OSDs it introduces > (tons of) slow ops and mds slow requests. Remapping more than 10 PGs at > a time will result in OSDs marked as dead (iothread timeout). Scrubbing > (with default settings) triggers slow ops too. Half of the cluster is > running on SSDs (SAMSUNG MZ7LM3T8HMLP-5 / INTEL SSDSC2KB03) with > cache mode in write through, the other half is NVMe (SAMSUNG > MZQLB3T8HALS-7). No seperate WAL/DB devices. SSDs run on Intel (14 > cores / 128 GB RAM), NVMe on AMD EPYC gen 1 / 2 with 16 cores 128 GB > RAM). OSD_MEMORY_TARGET=11G. The load on the pool (and cluster in > general) is modest. Plenty of CPU power available (mostly idling > really). In the order of ~6 K MDS requests, ~ 1.5 K metadata ops > (ballpark figure). > > We currently have 512 PGs allocated to this pool. The autoscaler suggest > reducing this amount to "32" PGs. This would result in only a fraction > of the OSDs having *all* of the metadata. I can tell you, based on > experience, that is not a good advise (the longer story here [1]). At > least you want to spread out all OMAP data over as many (fast) disks as > possible. So in this case it should advise 256. > Curious, how many PGs do you have in total in all the pools of your Ceph cluster? What are the other pools (e.g., data pools) and each of their PG counts? What version of Ceph are you using? > As the PGs merely act as a "placeholder" for the (OMAP) data residing in > the RocksDB database I wonder if it would help improve performance if we > would split the PGs to, let's say, 2048 PGs. The amount of OMAP per PG > would go down dramatically. Currently the amount of OMAP bytes per PG is > ~ 1 GiB and # keys is ~ 2.3 M. Are these numbers crazy high causing the > issues we see? > > I guess upgrading to Pacific and sharding RocksDB would help a lot as > well. But is there anything we can do to improve the current situation? > Apart from throwing more OSDs at the problem ... > > Thanks, > > Gr. Stefan > > [1]: > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SDFJECHHVGVP3RTL3U5SG4NNYZOV5ALT/ > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > Regards, Ramana ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Octopus client for Nautilus OSD/MON
The "next" release is always compatible with "previous one" clusters k Sent from my iPhone > On 2 Jun 2022, at 16:28, Jiatong Shen wrote: > > Hello, > >where can I find librbd compatility matrix? For example, Is octopus > client compatible with nautilus server? Thank you. > > -- > > Best Regards, > > Jiatong Shen > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Unable to deploy new manager in octopus
Hi, On my test cluster, I migrated from Nautilus to Octopus and the converted most of the daemons to cephadm. I got a lot of problem with podman 1.6.4 on CentOS 7 through an https proxy because my servers are on a private network. Now, I'm unable to deploy new managers and the cluster is in a bizarre situation: [root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph -s cluster: id: f5a025f9-fbe8-4506-8769-453902eb28d6 health: HEALTH_WARN client is using insecure global_id reclaim mons are allowing insecure global_id reclaim failed to probe daemons or devices 42 stray daemon(s) not managed by cephadm 2 stray host(s) with 39 daemon(s) not managed by cephadm 1 daemons have recently crashed services: mon: 5 daemons, quorum cepht003,cepht002,cepht001,cepht004,cephtstor01 (age 19m) mgr: cepht004.wyibzh(active, since 29m), standbys: cepht003.aa mds: fsdup:1 fsec:1 {fsdup:0=fsdup.cepht001.opiyzk=up:active,fsec:0=fsec.cepht003.giatub=up:active} 7 up:standby osd: 40 osds: 40 up (since 92m), 40 in (since 3d) rgw: 2 daemons active (cepht001, cepht004) task status: data: pools: 18 pools, 577 pgs objects: 6.32k objects, 2[root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID mds.fdec.cepht004.vbuphb cepht004 running (62m) 47s ago 4h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 5fad10ffc981 mds.fdec.cephtstor01.gtxsnr cephtstor01 running (24m) 46s ago 24m 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 24e837f6ac8a mds.fdup.cepht001.nydfzs cepht001 running (2h) 47s ago 2h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 b1880e343ece mds.fdup.cepht003.thsnbk cepht003 running (34m) 45s ago 34m 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 ddd4e395e7b3 mds.fsdup.cepht001.opiyzk cepht001 running (4h) 47s ago 4h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 ad081f718863 mds.fsdup.cepht004.cfnxxw cepht004 running (62m) 47s ago 20h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 c6feed82af8f mds.fsec.cepht002.uebrlc cepht002 running (20m) 47s ago 20m 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 836f448c5708 mds.fsec.cepht003.giatub cepht003 running (76m) 45s ago 5h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 f235957145cb mgr.cepht003.aa cepht003 stopped 45s ago 20h 15.2.6 quay.io/ceph/ceph:v15.2.6 f16a759354cc 770d7cf078ad mgr.cepht004.wyibzh cepht004 unknown 47s ago 20h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 6baa0f625271 mon.cepht001 cepht001 running (4h) 47s ago 4h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 e7f24769153c mon.cepht002 cepht002 running (20m) 47s ago 20m 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 dbb5be113201 mon.cepht003 cepht003 running (76m) 45s ago 5h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 6c2d6707b3fe mon.cepht004 cepht004 running (62m) 47s ago 4h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 7986b598fd17 mon.cephtstor01 cephtstor01 running (93m) 46s ago 2h 15.2.13 docker.io/ceph/ceph:v15 2cf504fded39 dbd9255aab10 osd.10 cephtstor01 running (93m) 46s ago 2h 15.2.16 quay.io/ceph/ceph:v15 8d5775c85c6a 01b07c4a75f7 4 GiB usage: 80 GiB used, 102 TiB / 102 TiB avail pgs: 577 active+clean When I try to create a new mgr, I get : [ceph: root@cepht002 /]# ceph orch daemon add mgr cepht002 Error EINVAL: cephadm exited with an error code: 1, stderr:Deploy daemon mgr.cepht002.kqhnbt ... Verifying port 8443 ... ERROR: TCP Port(s) '8443' required for mgr already in use But nothing runs on that port: [root@cepht002 f5a025f9-fbe8-4506-8769-453902eb28d6]# ss -lntu Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port udp UNCONN 0 0 127.0.0.1:323 *:* tcp LISTEN 0 128 192.168.64.152:6789 *:* tcp LISTEN 0 128 192.168.64.152:6800 *:* tcp LISTEN 0 128 192.168.64.152:6801 *:* tcp LISTEN 0 128 *:22 *:* tcp LISTEN 0 100 127.0.0.1:25 *:* tcp LISTEN 0 128 127.0.0.1:6010 *:* tcp LISTEN 0 128 *:10050 *:* tcp LISTEN 0 128 192.168.64.152:3300 *:* I get the same error with the command "ceph orch apply mgr ...". The same for each node in the cluster. I find no answer on Google... Any idea ? Patrick
[ceph-users] OSD_FULL raised when osd was not full (octopus 15.2.16)
Hi, Yesterday we hit OSD_FULL / POOL_FULL conditions for two brief moments. As all OSDs are present in all pools, all IO was stalled. Which impacted a few MDs clients (got evicted). Although the impact was limited, I *really* would like to understand how that could happen, as it should not have happened as far as I can tell. And it freaked me out. Logs: 2022-06-01T14:04:00.043+0200 7fbbc683a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL) 2022-06-01T14:04:06.159+0200 7fbbc683a700 0 log_channel(cluster) log [INF] : Health check cleared: OSD_FULL (was: 1 full osd(s)) 2022-06-01T14:04:11.319+0200 7fbbc683a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL) 2022-06-01T14:04:33.027+0200 7fbbc683a700 0 log_channel(cluster) log [INF] : Health check cleared: OSD_FULL (was: 1 full osd(s)) The weird thing was, the fullest OSD at that time was 82.759% full (something we monitor very closely). OSD_FULL ratio was 0.9. Backfill-full 0.9. Nearfull ratio 0.85. OSD_NEARFULL was never logged. So somehow it "jumped" to this state, a few times. Observation: It seems that the osd ID of the full OSD(s) are not logged anywhere. OSD_NEARFULL osds *do* get logged. I did not have time to type a ceph health detail fast enough. I haven't found the code responsible for logging the nearfull osd IDs but I guess it's missing for full osds. I can make a tracker for that. At the time this flag was raised there were a lot of PGs remapped (~ 1200). There were ~ 21 BACKFILL_TOOFUL and ~ 8 BACKFILLFULL OSDs. The norebalance flag was set. No degraded data, only misplaced. We were performing a "reverse balance" with the upmap-remap.py script (to have ceph balancer slowly move PGs later on). A couple of minutes before we had set out 10 OSDs of one host (hence the remaps). We have performed this operation many times before in the past month without issues. Was this a glitch? Or is there a valid reason for Ceph to raise a OSD_FULL on an OSD with (potentially) many BACKFILL(NEAR)FULL? How can I find out what OSD was "full"? I.e. What keywords to grep for in the OSD logs. If it's logged at all of course. Thanks, Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Octopus client for Nautilus OSD/MON
Hello, where can I find librbd compatility matrix? For example, Is octopus client compatible with nautilus server? Thank you. -- Best Regards, Jiatong Shen ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Moving rbd-images across pools?
Hey Angelo, what you're asking for is "Live Migration". https://docs.ceph.com/en/latest/rbd/rbd-live-migration/ says: The live-migration copy process can safely run in the background while the new target image is in use. There is currently a requirement to temporarily stop using the source image before preparing a migration when not using the import-only mode of operation. This helps to ensure that the client using the image is updated to point to the new target image. Best regards, Jan-Philipp ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Multi-active MDS cache pressure
Hi, I'm currently debugging a reoccuring issue with multi-active MDS. The cluster is still on Nautilus and can't be upgraded at this time. There have been many discussions about "cache pressure" and I was able to find the right settings a couple of times, but before I change too much in this setup I'd like to ask for your opinion. I'll add some information at the end. So we have 16 active MDS daemons spread over 2 servers for one cephfs (8 daemons per server) with mds_cache_memory_limit = 64GB, the MDS servers are mostly idle except for some short peaks. Each of the MDS daemons uses around 2 GB according to 'ceph daemon mds. cache status', so we're nowhere near the 64GB limit. There are currently 25 servers that mount the cephs as clients. Watching the ceph health I can see that the reported clients with cache pressure change, so they are not actually stuck but just don't respond as quickly as the MDS would like them to (I assume). For some of the mentioned clients I see high values for .recall_caps.value in the 'daemon session ls' output (at the bottom). The docs basically state this: When the MDS needs to shrink its cache (to stay within mds_cache_size), it sends messages to clients to shrink their caches too. The client is unresponsive to MDS requests to release cached inodes. Either the client is unresponsive or has a bug To me it doesn't seem like the MDS servers are near the cache size limit, so it has to be the clients, right? In a different setup it helped to decrease the client_oc_size from 200MB to 100MB, but then there's also client_cache_size with 16K default. I'm not sure what the best approach would be here. I'd appreciate any comments on how to size the various cache/caps/threshold configurations. Thanks! Eugen ---snip--- # ceph daemon mds. session ls "id": 2728101146, "entity": { "name": { "type": "client", "num": 2728101146 }, [...] "nonce": 1105499797 } }, "state": "open", "num_leases": 0, "num_caps": 16158, "request_load_avg": 0, "uptime": 1118066.210318422, "requests_in_flight": 0, "completed_requests": [], "reconnecting": false, "recall_caps": { "value": 788916.8276369586, "halflife": 60 }, "release_caps": { "value": 8.814981576458962, "halflife": 60 }, "recall_caps_throttle": { "value": 27379.27162576508, "halflife": 1.5 }, "recall_caps_throttle2o": { "value": 5382.261925615086, "halflife": 0.5 }, "session_cache_liveness": { "value": 12.91841737465921, "halflife": 300 }, "cap_acquisition": { "value": 0, "halflife": 10 }, [...] "used_inos": [], "client_metadata": { "features": "0x3bff", "entity_id": "cephfs_client", # ceph fs status cephfs - 25 clients == +--+++---+---+---+ | Rank | State | MDS |Activity | dns | inos | +--+++---+---+---+ | 0 | active | stmailmds01d-3 | Reqs: 89 /s | 375k | 371k | | 1 | active | stmailmds01d-4 | Reqs: 64 /s | 386k | 383k | | 2 | active | stmailmds01a-3 | Reqs:9 /s | 403k | 399k | | 3 | active | stmailmds01a-8 | Reqs: 23 /s | 393k | 390k | | 4 | active | stmailmds01a-2 | Reqs: 36 /s | 391k | 387k | | 5 | active | stmailmds01a-4 | Reqs: 57 /s | 394k | 390k | | 6 | active | stmailmds01a-6 | Reqs: 50 /s | 395k | 391k | | 7 | active | stmailmds01d-5 | Reqs: 37 /s | 384k | 380k | | 8 | active | stmailmds01a-5 | Reqs: 39 /s | 397k | 394k | | 9 | active | stmailmds01a | Reqs: 23 /s | 400k | 396k | | 10 | active | stmailmds01d-8 | Reqs: 74 /s | 402k | 399k | | 11 | active | stmailmds01d-6 | Reqs: 37 /s | 399k | 395k | | 12 | active | stmailmds01d | Reqs: 36 /s | 394k | 390k | | 13 | active | stmailmds01d-7 | Reqs: 80 /s | 397k | 393k | | 14 | active | stmailmds01d-2 | Reqs: 56 /s | 414k | 410k | | 15 | active | stmailmds01a-7 | Reqs: 25 /s | 390k | 387k | +--+++---+---+---+ +-+--+---+---+ | Pool | type | used | avail | +-+--+---+---+ | cephfs_metadata | metadata | 25.4G | 16.1T | | cephfs_data | data | 2078G | 16.1T | +-+--+---+---+ ++ | Standby MDS | ++ | stmailmds01b-5 | | stmailmds01b-2 | | stmailmds01b-3 | | stmailmds01b | | stmailmds01b-7 | | stmailmds01b-8 | | stmailmds01b-6 | | stmailmds01b-4 | ++ MDS version: ceph version 14.2.22-404-gf74e15c2e55 (f74e15c2e552b3359f5a51482dfd8b049e262743) nautilus (stable) ---snip--- ___ ceph-users mailing list -- ceph-users@ceph.io To
[ceph-users] Re: MDS stuck in replay
at this stage we are not so worried about recovery since we moved to our new pacific cluster. The problem arose during one of the nightly syncs of the old cluster to the new cluster. However, we are quite keen to use this as a learning opportunity to see what we can do to bring this filesystem back to life. On Wed, 2022-06-01 at 20:11 -0400, Ramana Venkatesh Raja wrote: > Can you temporarily turn up the MDS debug log level (debug_mds) to > > check what's happening to this MDS during replay? > > ceph config set mds debug_mds 10 > > 2022-06-02 09:32:36.814 7faca6d16700 5 mds.beacon.store06 Sending beacon up:replay seq 195662 2022-06-02 09:32:36.814 7faca6d16700 1 -- [v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776] --> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] -- mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 -- 0x5603d846d200 con 0x560185920c00 2022-06-02 09:32:36.814 7facab51f700 1 -- [v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776] <== mon.0 v2:192.168.34.179:3300/0 230794 mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 132+0+0 (crc 0 0 0) 0x5603d846d200 con 0x560185920c00 2022-06-02 09:32:36.814 7facab51f700 5 mds.beacon.store06 received beacon reply up:replay seq 195662 rtt 0 2022-06-02 09:32:37.090 7faca4d12700 2 mds.0.cache Memory usage: total 22446592, rss 18448072, heap 332040, baseline 307464, 0 / 6982189 inodes have caps, 0 caps, 0 caps per inode 2022-06-02 09:32:37.090 7faca4d12700 10 mds.0.cache cache not ready for trimming 2022-06-02 09:32:38.091 7faca4d12700 2 mds.0.cache Memory usage: total 22446592, rss 18448072, heap 332040, baseline 307464, 0 / 6982189 inodes have caps, 0 caps, 0 caps per inode 2022-06-02 09:32:38.091 7faca4d12700 10 mds.0.cache cache not ready for trimming 2022-06-02 09:32:38.320 7faca6515700 1 -- [v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776] --> [v2:192.168.34.124:6805/1445500,v1:192.168.34.124:6807/1445500] -- mgrreport(unknown.store06 +0-0 packed 1414) v8 -- 0x56018651ae00 con 0x5601869cb400 2022-06-02 09:32:39.092 7faca4d12700 2 mds.0.cache Memory usage: total 22446592, rss 18448072, heap 332040, baseline 307464, 0 / 6982189 inodes have caps, 0 caps, 0 caps per inode 2022-06-02 09:32:39.092 7faca4d12700 10 mds.0.cache cache not ready for trimming 2022-06-02 09:32:40.094 7faca4d12700 2 mds.0.cache Memory usage: total 22446592, rss 18448072, heap 332040, baseline 307464, 0 / 6982189 inodes have caps, 0 caps, 0 caps per inode 2022-06-02 09:32:40.094 7faca4d12700 10 mds.0.cache cache not ready for trimming 2022-06-02 09:32:40.813 7faca6d16700 5 mds.beacon.store06 Sending beacon up:replay seq 195663 2022-06-02 09:32:40.813 7faca6d16700 1 -- [v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776] --> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] -- mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 -- 0x5603d846d500 con 0x560185920c00 2022-06-02 09:32:40.813 7facab51f700 1 -- [v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776] <== mon.0 v2:192.168.34.179:3300/0 230795 mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 132+0+0 (crc 0 0 0) 0x5603d846d500 con 0x560185920c00 2022-06-02 09:32:40.813 7facab51f700 5 mds.beacon.store06 received beacon reply up:replay seq 195663 rtt 0 2022-06-02 09:32:41.095 7faca4d12700 2 mds.0.cache Memory usage: total 22446592, rss 18448072, heap 332040, baseline 307464, 0 / 6982189 inodes have caps, 0 caps, 0 caps per inode > > Is the health of the MDS host okay? Is it low on memory? > > plenty [root@store06 ~]# free totalusedfree shared buff/cache a vailable Mem: 13193960475007512 2646656338054285436 52944852 Swap: 32930300180032928500 > > > The cluster is healthy.> > > Can you share the output of the `ceph status` , `ceph fs status` and > > `ceph --version`? [root@store06 ~]# ceph status cluster: id: ebaa4a8f-5f17-4d57-b83b-a10f0226efaa health: HEALTH_WARN 1 filesystem is degraded services: mon: 3 daemons, quorum store09,store08,store07 (age 10d) mgr: store08(active, since 15h), standbys: store09, store07 mds: one:2/2 {0=store06=up:replay,1=store05=up:resolve} 3 up:standby osd: 116 osds: 116 up (since 10d), 116 in (since 4M) data: pools: 3 pools, 5121 pgs objects: 275.90M objects, 202 TiB usage: 625 TiB used, 182 TiB / 807 TiB avail pgs: 5115 active+clean 6active+clean+scrubbing+deep [root@store06 ~]# ceph fs status one - 741 clients === +--+-+-+--+---+---+ | Rank | State | MDS | Activity | dns | inos | +--+-+-+--+---+---+ | 0 | replay | store06 | | 7012k | 6982k | | 1 | resolve | store05 | | 82.9k | 78.4k | +--+-+-+--+---+---+