date:20220602

[ceph-users] Re: Octopus client for Nautilus OSD/MON

2022-06-02 Thread Jiatong Shen

Thank you very much!

On Thu, Jun 2, 2022 at 11:23 PM Konstantin Shalygin  wrote:

> The "next" release is always compatible with "previous one" clusters
>
>
> k
> Sent from my iPhone
>
> > On 2 Jun 2022, at 16:28, Jiatong Shen  wrote:
> >
> > Hello,
> >
> >where can I find librbd compatility matrix? For example, Is octopus
> > client compatible with nautilus server? Thank you.
> >
> > --
> >
> > Best Regards,
> >
> > Jiatong Shen
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Best Regards,

Jiatong Shen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Slow delete speed through the s3 API

2022-06-02 Thread Wesley Dillingham

Is it just your deletes which are slow or writes and read as well?

On Thu, Jun 2, 2022, 4:09 PM J-P Methot  wrote:

> I'm following up on this as we upgraded to Pacific 16.2.9 and deletes
> are still incredibly slow. The pool rgw is using is a fairly small
> erasure coding pool set at 8 + 3. Is there anyone who's having the same
> issue?
>
> On 5/16/22 15:23, J-P Methot wrote:
> > Hi,
> >
> > First of all, a quick google search shows me that questions about the
> > s3 API slow object deletion speed have been asked before and are well
> > documented. My issue is slightly different, because I am getting
> > abysmal speeds of 11 objects/second on a full SSD ceph running Octopus
> > with about a hundred OSDs. This is much lower than the Redhat reported
> > limit of 1000 objects/second.
> >
> > I've seen elsewhere that it was a Rocksdb limitation and that it would
> > be fixed in Pacific, but the Pacific release logs do not show me
> > anything that suggest that. Furthermore, I have limited control over
> > the s3client deleting the files as it's a 3rd-party open source
> > automatic backup program.
> >
> > Could updating to Pacific fix this issue? Is there any configuration
> > change I could do to speed up object deletion?
> >
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed picking the right amount of PGs for (Cephfs) metadata pool

2022-06-02 Thread Ramana Venkatesh Raja

On Thu, Jun 2, 2022 at 11:40 AM Stefan Kooman  wrote:
>
> Hi,
>
> We have a CephFS filesystem holding 70 TiB of data in ~ 300 M files and
> ~ 900 M sub directories. We currently have 180 OSDs in this cluster.
>
> POOL  ID  PGS   STORED   (DATA)   (OMAP)   OBJECTS  USED
> (DATA)   (OMAP)   %USED  MAX AVAIL
> cephfs_metadata   6   512  984 GiB  243 MiB  984 GiB  903.98M  2.9 TiB
> 728 MiB  2.9 TiB   3.06 30 TiB
>
> The PGs in this pool (replicated, size=3, min_size=2), 6, are giving us
> a hard time (again). When PGs get remapped to other OSDs it introduces
> (tons of) slow ops and mds slow requests. Remapping more than 10 PGs at
> a time will result in OSDs marked as dead (iothread timeout). Scrubbing
> (with default settings) triggers slow ops too. Half of the cluster is
> running on SSDs (SAMSUNG MZ7LM3T8HMLP-5 / INTEL SSDSC2KB03) with
> cache mode in write through, the other half is NVMe (SAMSUNG
> MZQLB3T8HALS-7). No seperate WAL/DB devices. SSDs run on Intel (14
> cores / 128 GB RAM), NVMe on AMD EPYC gen 1 / 2 with 16 cores 128 GB
> RAM). OSD_MEMORY_TARGET=11G. The load on the pool (and cluster in
> general) is modest. Plenty of CPU power available (mostly idling
> really). In the order of ~6 K MDS requests, ~ 1.5 K metadata ops
> (ballpark figure).
>
> We currently have 512 PGs allocated to this pool. The autoscaler suggest
> reducing this amount to "32" PGs. This would result in only a fraction
> of the OSDs having *all* of the metadata. I can tell you, based on
> experience, that is not a good advise (the longer story here [1]). At
> least you want to spread out all OMAP data over as many (fast) disks as
> possible. So in this case it should advise 256.
>

Curious, how many PGs do you have in total in all the pools of your
Ceph cluster? What are the other pools (e.g., data pools) and each of
their PG counts?

What version of Ceph are you using?

> As the PGs merely act as a "placeholder" for the (OMAP) data residing in
> the RocksDB database I wonder if it would help improve performance if we
> would split the PGs to, let's say, 2048 PGs. The amount of OMAP per PG
> would go down dramatically. Currently the amount of OMAP bytes per PG is
> ~ 1 GiB and # keys is ~ 2.3 M. Are these numbers crazy high causing the
> issues we see?
>
> I guess upgrading to Pacific and sharding RocksDB would help a lot as
> well. But is there anything we can do to improve the current situation?
> Apart from throwing more OSDs at the problem ...
>
> Thanks,
>
> Gr. Stefan
>
> [1]:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SDFJECHHVGVP3RTL3U5SG4NNYZOV5ALT/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

Regards,
Ramana

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Octopus client for Nautilus OSD/MON

2022-06-02 Thread Konstantin Shalygin

The "next" release is always compatible with "previous one" clusters


k
Sent from my iPhone

> On 2 Jun 2022, at 16:28, Jiatong Shen  wrote:
> 
> Hello,
> 
>where can I find librbd compatility matrix? For example, Is octopus
> client compatible with nautilus server? Thank you.
> 
> -- 
> 
> Best Regards,
> 
> Jiatong Shen
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Unable to deploy new manager in octopus

2022-06-02 Thread Patrick Vranckx


Hi,

On my test cluster, I migrated from Nautilus to Octopus and the 
converted most of the daemons to cephadm. I got a lot of problem with 
podman 1.6.4 on CentOS 7 through an https proxy because my servers are 
on a private network.


Now, I'm unable to deploy new managers and the cluster is in a bizarre 
situation:


[root@cepht003 f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph -s
  cluster:
    id: f5a025f9-fbe8-4506-8769-453902eb28d6
    health: HEALTH_WARN
    client is using insecure global_id reclaim
    mons are allowing insecure global_id reclaim
    failed to probe daemons or devices
    42 stray daemon(s) not managed by cephadm
    2 stray host(s) with 39 daemon(s) not managed by cephadm
    1 daemons have recently crashed

  services:
    mon: 5 daemons, quorum 
cepht003,cepht002,cepht001,cepht004,cephtstor01 (age 19m)

    mgr: cepht004.wyibzh(active, since 29m), standbys: cepht003.aa
    mds: fsdup:1 fsec:1 
{fsdup:0=fsdup.cepht001.opiyzk=up:active,fsec:0=fsec.cepht003.giatub=up:active} 
7 up:standby

    osd: 40 osds: 40 up (since 92m), 40 in (since 3d)
    rgw: 2 daemons active (cepht001, cepht004)

  task status:

  data:
    pools:   18 pools, 577 pgs
    objects: 6.32k objects, 2[root@cepht003 
f5a025f9-fbe8-4506-8769-453902eb28d6]# ceph orch ps
NAME  HOST STATUS REFRESHED  
AGE  VERSION    IMAGE NAME IMAGE ID  CONTAINER ID


mds.fdec.cepht004.vbuphb  cepht004 running (62m)  
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
5fad10ffc981
mds.fdec.cephtstor01.gtxsnr   cephtstor01  running (24m)  
46s ago    24m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
24e837f6ac8a
mds.fdup.cepht001.nydfzs  cepht001 running (2h)   
47s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
b1880e343ece
mds.fdup.cepht003.thsnbk  cepht003 running (34m)  
45s ago    34m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
ddd4e395e7b3
mds.fsdup.cepht001.opiyzk cepht001 running (4h)   
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
ad081f718863
mds.fsdup.cepht004.cfnxxw cepht004 running (62m)  
47s ago    20h  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
c6feed82af8f
mds.fsec.cepht002.uebrlc  cepht002 running (20m)  
47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
836f448c5708
mds.fsec.cepht003.giatub  cepht003 running (76m)  
45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
f235957145cb
mgr.cepht003.aa   cepht003 stopped    45s 
ago    20h  15.2.6 quay.io/ceph/ceph:v15.2.6  f16a759354cc  770d7cf078ad
mgr.cepht004.wyibzh   cepht004 unknown    47s 
ago    20h  15.2.13 docker.io/ceph/ceph:v15    2cf504fded39  6baa0f625271
mon.cepht001  cepht001 running (4h)   
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
e7f24769153c
mon.cepht002  cepht002 running (20m)  
47s ago    20m  15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
dbb5be113201
mon.cepht003  cepht003 running (76m)  
45s ago    5h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
6c2d6707b3fe
mon.cepht004  cepht004 running (62m)  
47s ago    4h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
7986b598fd17
mon.cephtstor01   cephtstor01  running (93m)  
46s ago    2h   15.2.13    docker.io/ceph/ceph:v15 2cf504fded39  
dbd9255aab10
osd.10    cephtstor01  running (93m)  
46s ago    2h   15.2.16    quay.io/ceph/ceph:v15 8d5775c85c6a  
01b07c4a75f7  4 GiB

    usage:   80 GiB used, 102 TiB / 102 TiB avail
    pgs: 577 active+clean


When I try to create a new mgr, I get :

[ceph: root@cepht002 /]# ceph orch daemon add mgr cepht002
Error EINVAL: cephadm exited with an error code: 1, stderr:Deploy daemon 
mgr.cepht002.kqhnbt ...

Verifying port 8443 ...
ERROR: TCP Port(s) '8443' required for mgr already in use

But nothing runs on that port:

[root@cepht002 f5a025f9-fbe8-4506-8769-453902eb28d6]# ss -lntu
Netid  State  Recv-Q Send-Q Local Address:Port Peer Address:Port
udp    UNCONN 0  0 127.0.0.1:323 *:*
tcp    LISTEN 0  128 192.168.64.152:6789 *:*
tcp    LISTEN 0  128 192.168.64.152:6800 *:*
tcp    LISTEN 0  128 192.168.64.152:6801 *:*
tcp    LISTEN 0 128 *:22 *:*
tcp    LISTEN 0  100 127.0.0.1:25 *:*
tcp    LISTEN 0  128 127.0.0.1:6010 *:*
tcp    LISTEN 0 128 *:10050 *:*
tcp    LISTEN 0  128 192.168.64.152:3300 *:*

I get the same error with the command "ceph orch apply mgr ...". The 
same for each node in the cluster.


I find no answer on Google...

Any idea ?

Patrick

[ceph-users] OSD_FULL raised when osd was not full (octopus 15.2.16)

2022-06-02 Thread Stefan Kooman


Hi,

Yesterday we hit OSD_FULL / POOL_FULL conditions for two brief moments. 
As all OSDs are present in all pools, all IO was stalled. Which impacted 
a few MDs clients (got evicted). Although the impact was limited, I 
*really* would like to understand how that could happen, as it should 
not have happened as far as I can tell. And it freaked me out. Logs:


2022-06-01T14:04:00.043+0200 7fbbc683a700 -1 log_channel(cluster) log 
[ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2022-06-01T14:04:06.159+0200 7fbbc683a700  0 log_channel(cluster) log 
[INF] : Health check cleared: OSD_FULL (was: 1 full osd(s))
2022-06-01T14:04:11.319+0200 7fbbc683a700 -1 log_channel(cluster) log 
[ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2022-06-01T14:04:33.027+0200 7fbbc683a700  0 log_channel(cluster) log 
[INF] : Health check cleared: OSD_FULL (was: 1 full osd(s))


The weird thing was, the fullest OSD at that time was 82.759% full 
(something we monitor very closely). OSD_FULL ratio was 0.9. 
Backfill-full 0.9. Nearfull ratio 0.85.


OSD_NEARFULL was never logged. So somehow it "jumped" to this state, a 
few times.


Observation: It seems that the osd ID of the full OSD(s) are not logged 
anywhere. OSD_NEARFULL osds *do* get logged. I did not have time to type 
a ceph health detail fast enough. I haven't found the code responsible 
for logging the nearfull osd IDs but I guess it's missing for full osds. 
I can make a tracker for that.


At the time this flag was raised there were a lot of PGs remapped (~ 
1200). There were ~ 21 BACKFILL_TOOFUL and ~ 8 BACKFILLFULL OSDs. The 
norebalance flag was set. No degraded data, only misplaced. We were 
performing a "reverse balance" with the upmap-remap.py script (to have 
ceph balancer slowly move PGs later on). A couple of minutes before we 
had set out 10 OSDs of one host (hence the remaps). We have performed 
this operation many times before in the past month without issues.


Was this a glitch? Or is there a valid reason for Ceph to raise a 
OSD_FULL on an OSD with (potentially) many BACKFILL(NEAR)FULL?


How can I find out what OSD was "full"? I.e. What keywords to grep for 
in the OSD logs. If it's logged at all of course.


Thanks,

Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Octopus client for Nautilus OSD/MON

2022-06-02 Thread Jiatong Shen

Hello,

where can I find librbd compatility matrix? For example, Is octopus
client compatible with nautilus server? Thank you.

-- 

Best Regards,

Jiatong Shen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Moving rbd-images across pools?

2022-06-02 Thread Jan-Philipp Litza


Hey Angelo,

what you're asking for is "Live Migration".
https://docs.ceph.com/en/latest/rbd/rbd-live-migration/ says:

The live-migration copy process can safely run in the background while the new 
target image is in use. There is currently a requirement to temporarily stop 
using the source image before preparing a migration when not using the 
import-only mode of operation. This helps to ensure that the client using the 
image is updated to point to the new target image.

Best regards,
Jan-Philipp
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Multi-active MDS cache pressure

2022-06-02 Thread Eugen Block


Hi,

I'm currently debugging a reoccuring issue with multi-active MDS. The  
cluster is still on Nautilus and can't be upgraded at this time. There  
have been many discussions about "cache pressure" and I was able to  
find the right settings a couple of times, but before I change too  
much in this setup I'd like to ask for your opinion. I'll add some  
information at the end.
So we have 16 active MDS daemons spread over 2 servers for one cephfs  
(8 daemons per server) with mds_cache_memory_limit = 64GB, the MDS  
servers are mostly idle except for some short peaks. Each of the MDS  
daemons uses around 2 GB according to 'ceph daemon mds. cache  
status', so we're nowhere near the 64GB limit. There are currently 25  
servers that mount the cephs as clients.
Watching the ceph health I can see that the reported clients with  
cache pressure change, so they are not actually stuck but just don't  
respond as quickly as the MDS would like them to (I assume). For some  
of the mentioned clients I see high values for .recall_caps.value in  
the 'daemon session ls' output (at the bottom).


The docs basically state this:
When the MDS needs to shrink its cache (to stay within  
mds_cache_size), it sends messages to clients to shrink their caches  
too. The client is unresponsive to MDS requests to release cached  
inodes. Either the client is unresponsive or has a bug


To me it doesn't seem like the MDS servers are near the cache size  
limit, so it has to be the clients, right? In a different setup it  
helped to decrease the client_oc_size from 200MB to 100MB, but then  
there's also client_cache_size with 16K default. I'm not sure what the  
best approach would be here. I'd appreciate any comments on how to  
size the various cache/caps/threshold configurations.


Thanks!
Eugen


---snip---
# ceph daemon mds. session ls

"id": 2728101146,
"entity": {
  "name": {
"type": "client",
"num": 2728101146
  },
[...]
"nonce": 1105499797
  }
},
"state": "open",
"num_leases": 0,
"num_caps": 16158,
"request_load_avg": 0,
"uptime": 1118066.210318422,
"requests_in_flight": 0,
"completed_requests": [],
"reconnecting": false,
"recall_caps": {
  "value": 788916.8276369586,
  "halflife": 60
},
"release_caps": {
  "value": 8.814981576458962,
  "halflife": 60
},
"recall_caps_throttle": {
  "value": 27379.27162576508,
  "halflife": 1.5
},
"recall_caps_throttle2o": {
  "value": 5382.261925615086,
  "halflife": 0.5
},
"session_cache_liveness": {
  "value": 12.91841737465921,
  "halflife": 300
},
"cap_acquisition": {
  "value": 0,
  "halflife": 10
},
[...]
"used_inos": [],
"client_metadata": {
  "features": "0x3bff",
  "entity_id": "cephfs_client",


# ceph fs status

cephfs - 25 clients
==
+--+++---+---+---+
| Rank | State  |  MDS   |Activity   |  dns  |  inos |
+--+++---+---+---+
|  0   | active | stmailmds01d-3 | Reqs:   89 /s |  375k |  371k |
|  1   | active | stmailmds01d-4 | Reqs:   64 /s |  386k |  383k |
|  2   | active | stmailmds01a-3 | Reqs:9 /s |  403k |  399k |
|  3   | active | stmailmds01a-8 | Reqs:   23 /s |  393k |  390k |
|  4   | active | stmailmds01a-2 | Reqs:   36 /s |  391k |  387k |
|  5   | active | stmailmds01a-4 | Reqs:   57 /s |  394k |  390k |
|  6   | active | stmailmds01a-6 | Reqs:   50 /s |  395k |  391k |
|  7   | active | stmailmds01d-5 | Reqs:   37 /s |  384k |  380k |
|  8   | active | stmailmds01a-5 | Reqs:   39 /s |  397k |  394k |
|  9   | active |  stmailmds01a  | Reqs:   23 /s |  400k |  396k |
|  10  | active | stmailmds01d-8 | Reqs:   74 /s |  402k |  399k |
|  11  | active | stmailmds01d-6 | Reqs:   37 /s |  399k |  395k |
|  12  | active |  stmailmds01d  | Reqs:   36 /s |  394k |  390k |
|  13  | active | stmailmds01d-7 | Reqs:   80 /s |  397k |  393k |
|  14  | active | stmailmds01d-2 | Reqs:   56 /s |  414k |  410k |
|  15  | active | stmailmds01a-7 | Reqs:   25 /s |  390k |  387k |
+--+++---+---+---+
+-+--+---+---+
|   Pool  |   type   |  used | avail |
+-+--+---+---+
| cephfs_metadata | metadata | 25.4G | 16.1T |
|   cephfs_data   |   data   | 2078G | 16.1T |
+-+--+---+---+
++
|  Standby MDS   |
++
| stmailmds01b-5 |
| stmailmds01b-2 |
| stmailmds01b-3 |
|  stmailmds01b  |
| stmailmds01b-7 |
| stmailmds01b-8 |
| stmailmds01b-6 |
| stmailmds01b-4 |
++
MDS version: ceph version 14.2.22-404-gf74e15c2e55  
(f74e15c2e552b3359f5a51482dfd8b049e262743) nautilus (stable)

---snip---

___
ceph-users mailing list -- ceph-users@ceph.io
To

[ceph-users] Re: MDS stuck in replay

2022-06-02 Thread Magnus HAGDORN

at this stage we are not so worried about recovery since we moved to
our new pacific cluster. The problem arose during one of the nightly
syncs of the old cluster to the new cluster. However, we are quite keen
to use this as a learning opportunity to see what we can do to bring
this filesystem back to life.

On Wed, 2022-06-01 at 20:11 -0400, Ramana Venkatesh Raja wrote:
> Can you temporarily turn up the MDS debug log level (debug_mds) to
>
> check what's happening to this MDS during replay?
>
> ceph config set mds debug_mds 10
>
>


2022-06-02 09:32:36.814 7faca6d16700  5 mds.beacon.store06 Sending
beacon up:replay seq 195662
2022-06-02 09:32:36.814 7faca6d16700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] --
mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 --
0x5603d846d200 con 0x560185920c00
2022-06-02 09:32:36.814 7facab51f700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
<== mon.0 v2:192.168.34.179:3300/0 230794 
mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 
132+0+0 (crc 0 0 0) 0x5603d846d200 con 0x560185920c00
2022-06-02 09:32:36.814 7facab51f700  5 mds.beacon.store06 received
beacon reply up:replay seq 195662 rtt 0
2022-06-02 09:32:37.090 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:37.090 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:38.091 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:38.091 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:38.320 7faca6515700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.124:6805/1445500,v1:192.168.34.124:6807/1445500] --
mgrreport(unknown.store06 +0-0 packed 1414) v8 -- 0x56018651ae00 con
0x5601869cb400
2022-06-02 09:32:39.092 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:39.092 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:40.094 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:40.094 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:40.813 7faca6d16700  5 mds.beacon.store06 Sending
beacon up:replay seq 195663
2022-06-02 09:32:40.813 7faca6d16700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] --
mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 --
0x5603d846d500 con 0x560185920c00
2022-06-02 09:32:40.813 7facab51f700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
<== mon.0 v2:192.168.34.179:3300/0 230795 
mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 
132+0+0 (crc 0 0 0) 0x5603d846d500 con 0x560185920c00
2022-06-02 09:32:40.813 7facab51f700  5 mds.beacon.store06 received
beacon reply up:replay seq 195663 rtt 0
2022-06-02 09:32:41.095 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode


>
> Is the health of the MDS host okay? Is it low on memory?
>
>
plenty
[root@store06 ~]# free
  totalusedfree  shared  buff/cache   a
vailable
Mem:  13193960475007512 2646656338054285436
52944852
Swap:  32930300180032928500


>
> > The cluster is healthy.>
>
> Can you share the output of the `ceph status` , `ceph fs status`  and
>
> `ceph --version`?
[root@store06 ~]# ceph status
  cluster:
id: ebaa4a8f-5f17-4d57-b83b-a10f0226efaa
health: HEALTH_WARN
1 filesystem is degraded

  services:
mon: 3 daemons, quorum store09,store08,store07 (age 10d)
mgr: store08(active, since 15h), standbys: store09, store07
mds: one:2/2 {0=store06=up:replay,1=store05=up:resolve} 3
up:standby
osd: 116 osds: 116 up (since 10d), 116 in (since 4M)

  data:
pools:   3 pools, 5121 pgs
objects: 275.90M objects, 202 TiB
usage:   625 TiB used, 182 TiB / 807 TiB avail
pgs: 5115 active+clean
 6active+clean+scrubbing+deep


[root@store06 ~]# ceph fs status
one - 741 clients
===
+--+-+-+--+---+---+
| Rank |  State  |   MDS   | Activity |  dns  |  inos |
+--+-+-+--+---+---+
|  0   |  replay | store06 |  | 7012k | 6982k |
|  1   | resolve | store05 |  | 82.9k | 78.4k |
+--+-+-+--+---+---+

[ceph-users] Re: Octopus client for Nautilus OSD/MON

[ceph-users] Re: Slow delete speed through the s3 API

[ceph-users] Re: Help needed picking the right amount of PGs for (Cephfs) metadata pool

[ceph-users] Re: Octopus client for Nautilus OSD/MON

[ceph-users] Unable to deploy new manager in octopus

[ceph-users] OSD_FULL raised when osd was not full (octopus 15.2.16)

[ceph-users] Octopus client for Nautilus OSD/MON

[ceph-users] Re: Moving rbd-images across pools?

[ceph-users] Multi-active MDS cache pressure

[ceph-users] Re: MDS stuck in replay

10 matches

Site Navigation

Mail list logo

Footer information