Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
On Tue, Sep 24, 2019 at 4:33 AM Thomas <74cmo...@gmail.com> wrote: > > Hi, > > I'm experiencing the same issue with this setting in ceph.conf: > osd op queue = wpq > osd op queue cut off = high > > Furthermore I cannot read any old data in the relevant pool that is > serving CephFS. > However, I can write new data and read this new data. If you restarted all the OSDs with this setting, it won't necessarily prevent any blocked IO, it just really helps prevent the really long blocked IO and makes sure that IO is eventually done in a more fair manner. It sounds like you may have some MDS issues that are deeper than my understanding. First thing I'd try is to bounce the MDS service. > > If I want to add this my ceph-ansible playbook parameters, in which files I > > should add it and what is the best way to do it ? > > > > Add those 3 lines in all.yml or osds.yml ? > > > > ceph_conf_overrides: > > global: > > osd_op_queue_cut_off: high > > > > Is there another (better?) way to do that? I can't speak to either of those approaches. I wanted all my config in a single file, so I put it in my inventory file, but it looks like you have the right idea. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
Hi, I'm experiencing the same issue with this setting in ceph.conf: osd op queue = wpq osd op queue cut off = high Furthermore I cannot read any old data in the relevant pool that is serving CephFS. However, I can write new data and read this new data. Regards Thomas Am 24.09.2019 um 10:24 schrieb Yoann Moulin: > Hello, > >>> I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk >>> (no SSD) in 20 servers. >>> >>> I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow >>> requests, 0 included below; oldest blocked for > 60281.199503 secs" >>> >>> After a few investigations, I saw that ALL ceph-osd process eat a lot of >>> memory, up to 130GB RSS each. It this value normal? May this related to >>> slow requests? Is disk only increasing the probability to get slow requests? >> If you haven't set: >> >> osd op queue cut off = high >> >> in /etc/ceph/ceph.conf on your OSDs, I'd give that a try. It should >> help quite a bit with pure HDD clusters. > OK I'll try this, thanks. > > If I want to add this my ceph-ansible playbook parameters, in which files I > should add it and what is the best way to do it ? > > Add those 3 lines in all.yml or osds.yml ? > > ceph_conf_overrides: > global: > osd_op_queue_cut_off: high > > Is there another (better?) way to do that? > > Thanks for your help. > > Best regards, > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
Hello, >> I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk >> (no SSD) in 20 servers. >> >> I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow >> requests, 0 included below; oldest blocked for > 60281.199503 secs" >> >> After a few investigations, I saw that ALL ceph-osd process eat a lot of >> memory, up to 130GB RSS each. It this value normal? May this related to >> slow requests? Is disk only increasing the probability to get slow requests? > > If you haven't set: > > osd op queue cut off = high > > in /etc/ceph/ceph.conf on your OSDs, I'd give that a try. It should > help quite a bit with pure HDD clusters. OK I'll try this, thanks. If I want to add this my ceph-ansible playbook parameters, in which files I should add it and what is the best way to do it ? Add those 3 lines in all.yml or osds.yml ? ceph_conf_overrides: global: osd_op_queue_cut_off: high Is there another (better?) way to do that? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
On Thu, Sep 19, 2019 at 2:36 AM Yoann Moulin wrote: > > Hello, > > I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk > (no SSD) in 20 servers. > > > cluster: > > id: 778234df-5784-4021-b983-0ee1814891be > > health: HEALTH_WARN > > 2 MDSs report slow requests > > > > services: > > mon: 3 daemons, quorum icadmin006,icadmin007,icadmin008 (age 5d) > > mgr: icadmin008(active, since 18h), standbys: icadmin007, icadmin006 > > mds: cephfs:3 > > {0=icadmin006=up:active,1=icadmin007=up:active,2=icadmin008=up:active} > > osd: 40 osds: 40 up (since 2w), 40 in (since 3w) > > > > data: > > pools: 3 pools, 672 pgs > > objects: 36.08M objects, 19 TiB > > usage: 51 TiB used, 15 TiB / 65 TiB avail > > pgs: 670 active+clean > > 2 active+clean+scrubbing > > I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow requests, > 0 included below; oldest blocked for > 60281.199503 secs" > > > HEALTH_WARN 2 MDSs report slow requests > > MDS_SLOW_REQUEST 2 MDSs report slow requests > > mdsicadmin007(mds.1): 3 slow requests are blocked > 30 secs > > mdsicadmin006(mds.0): 10 slow requests are blocked > 30 secs > > After a few investigations, I saw that ALL ceph-osd process eat a lot of > memory, up to 130GB RSS each. It this value normal? May this related to > slow requests? Is disk only increasing the probability to get slow requests? > > > USER PID %CPU %MEM VSZ RSS TTY STAT STAR TIME COMMAND > > ceph 34196 3.6 35.0 156247524 138521572 ? Ssl Jul01 4173:18 > > /usr/bin/ceph-osd -f --cluster apollo --id 1 --setuser ceph --setgroup ceph > > ceph 34394 3.6 35.0 160001436 138487776 ? Ssl Jul01 4178:37 > > /usr/bin/ceph-osd -f --cluster apollo --id 32 --setuser ceph --setgroup ceph > > ceph 34709 3.5 35.1 156369636 138752044 ? Ssl Jul01 4088:57 > > /usr/bin/ceph-osd -f --cluster apollo --id 29 --setuser ceph --setgroup ceph > > ceph 34915 3.4 35.1 158976936 138715900 ? Ssl Jul01 3950:45 > > /usr/bin/ceph-osd -f --cluster apollo --id 3 --setuser ceph --setgroup ceph > > ceph 34156 3.4 35.1 158280768 138714484 ? Ssl Jul01 3984:11 > > /usr/bin/ceph-osd -f --cluster apollo --id 30 --setuser ceph --setgroup ceph > > ceph 34378 3.7 35.1 155162420 138708096 ? Ssl Jul01 4312:12 > > /usr/bin/ceph-osd -f --cluster apollo --id 8 --setuser ceph --setgroup ceph > > ceph 34161 3.5 35.0 159606788 138523652 ? Ssl Jul01 4128:17 > > /usr/bin/ceph-osd -f --cluster apollo --id 16 --setuser ceph --setgroup ceph > > ceph 34380 3.6 35.1 161465372 138670168 ? Ssl Jul01 4238:20 > > /usr/bin/ceph-osd -f --cluster apollo --id 35 --setuser ceph --setgroup ceph > > ceph 33822 3.7 35.1 163456644 138734036 ? Ssl Jul01 4342:05 > > /usr/bin/ceph-osd -f --cluster apollo --id 15 --setuser ceph --setgroup ceph > > ceph 34003 3.8 35.0 161868584 138531208 ? Ssl Jul01 4427:32 > > /usr/bin/ceph-osd -f --cluster apollo --id 38 --setuser ceph --setgroup ceph > > ceph9753 2.8 24.2 96923856 95580776 ? Ssl Sep02 700:25 > > /usr/bin/ceph-osd -f --cluster apollo --id 31 --setuser ceph --setgroup ceph > > ceph 10120 2.5 24.0 96130340 94856244 ? Ssl Sep02 644:50 > > /usr/bin/ceph-osd -f --cluster apollo --id 7 --setuser ceph --setgroup ceph > > ceph 36204 3.6 35.0 159394476 138592124 ? Ssl Jul01 4185:36 > > /usr/bin/ceph-osd -f --cluster apollo --id 18 --setuser ceph --setgroup ceph > > ceph 36427 3.7 34.4 155699060 136076432 ? Ssl Jul01 4298:26 > > /usr/bin/ceph-osd -f --cluster apollo --id 36 --setuser ceph --setgroup ceph > > ceph 36622 4.1 35.1 158219408 138724688 ? Ssl Jul01 4779:14 > > /usr/bin/ceph-osd -f --cluster apollo --id 19 --setuser ceph --setgroup ceph > > ceph 36881 4.0 35.1 157748752 138719064 ? Ssl Jul01 4669:54 > > /usr/bin/ceph-osd -f --cluster apollo --id 37 --setuser ceph --setgroup ceph > > ceph 34649 3.7 35.1 159601580 138652012 ? Ssl Jul01 4337:20 > > /usr/bin/ceph-osd -f --cluster apollo --id 14 --setuser ceph --setgroup ceph > > ceph 34881 3.8 35.1 158632412 138764376 ? Ssl Jul01 4433:50 > > /usr/bin/ceph-osd -f --cluster apollo --id 33 --setuser ceph --setgroup ceph > > ceph 34646 4.2 35.1 155029328 138732376 ? Ssl Jul01 4831:24 > > /usr/bin/ceph-osd -f --cluster apollo --id 17 --setuser ceph --setgroup ceph > > ceph 34881 4.1 35.1 156801676 138763588 ? Ssl Jul01 4710:19 > > /usr/bin/ceph-osd -f --cluster apollo --id 39 --setuser ceph --setgroup ceph > > ceph 36766 3.7 35.1 158070740 138703240 ? Ssl Jul01 4341:42 > > /usr/bin/ceph-osd -f --cluster apollo --id 13 --setuser ceph --setgroup ceph > > ceph 37013 3.5 35.0 157767668 138272248 ? Ssl Jul01 4094:12 > > /usr/bin/ceph-osd -f --cluster apollo --id 34 --setuser ceph --setgroup ceph > > ceph 35007 3.4 35.1 160318780 138756404 ? Ssl Jul01 3963:21
[ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
Hello, I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk (no SSD) in 20 servers. > cluster: > id: 778234df-5784-4021-b983-0ee1814891be > health: HEALTH_WARN > 2 MDSs report slow requests > > services: > mon: 3 daemons, quorum icadmin006,icadmin007,icadmin008 (age 5d) > mgr: icadmin008(active, since 18h), standbys: icadmin007, icadmin006 > mds: cephfs:3 > {0=icadmin006=up:active,1=icadmin007=up:active,2=icadmin008=up:active} > osd: 40 osds: 40 up (since 2w), 40 in (since 3w) > > data: > pools: 3 pools, 672 pgs > objects: 36.08M objects, 19 TiB > usage: 51 TiB used, 15 TiB / 65 TiB avail > pgs: 670 active+clean > 2 active+clean+scrubbing I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow requests, 0 included below; oldest blocked for > 60281.199503 secs" > HEALTH_WARN 2 MDSs report slow requests > MDS_SLOW_REQUEST 2 MDSs report slow requests > mdsicadmin007(mds.1): 3 slow requests are blocked > 30 secs > mdsicadmin006(mds.0): 10 slow requests are blocked > 30 secs After a few investigations, I saw that ALL ceph-osd process eat a lot of memory, up to 130GB RSS each. It this value normal? May this related to slow requests? Is disk only increasing the probability to get slow requests? > USER PID %CPU %MEM VSZ RSS TTY STAT STAR TIME COMMAND > ceph 34196 3.6 35.0 156247524 138521572 ? Ssl Jul01 4173:18 > /usr/bin/ceph-osd -f --cluster apollo --id 1 --setuser ceph --setgroup ceph > ceph 34394 3.6 35.0 160001436 138487776 ? Ssl Jul01 4178:37 > /usr/bin/ceph-osd -f --cluster apollo --id 32 --setuser ceph --setgroup ceph > ceph 34709 3.5 35.1 156369636 138752044 ? Ssl Jul01 4088:57 > /usr/bin/ceph-osd -f --cluster apollo --id 29 --setuser ceph --setgroup ceph > ceph 34915 3.4 35.1 158976936 138715900 ? Ssl Jul01 3950:45 > /usr/bin/ceph-osd -f --cluster apollo --id 3 --setuser ceph --setgroup ceph > ceph 34156 3.4 35.1 158280768 138714484 ? Ssl Jul01 3984:11 > /usr/bin/ceph-osd -f --cluster apollo --id 30 --setuser ceph --setgroup ceph > ceph 34378 3.7 35.1 155162420 138708096 ? Ssl Jul01 4312:12 > /usr/bin/ceph-osd -f --cluster apollo --id 8 --setuser ceph --setgroup ceph > ceph 34161 3.5 35.0 159606788 138523652 ? Ssl Jul01 4128:17 > /usr/bin/ceph-osd -f --cluster apollo --id 16 --setuser ceph --setgroup ceph > ceph 34380 3.6 35.1 161465372 138670168 ? Ssl Jul01 4238:20 > /usr/bin/ceph-osd -f --cluster apollo --id 35 --setuser ceph --setgroup ceph > ceph 33822 3.7 35.1 163456644 138734036 ? Ssl Jul01 4342:05 > /usr/bin/ceph-osd -f --cluster apollo --id 15 --setuser ceph --setgroup ceph > ceph 34003 3.8 35.0 161868584 138531208 ? Ssl Jul01 4427:32 > /usr/bin/ceph-osd -f --cluster apollo --id 38 --setuser ceph --setgroup ceph > ceph9753 2.8 24.2 96923856 95580776 ? Ssl Sep02 700:25 > /usr/bin/ceph-osd -f --cluster apollo --id 31 --setuser ceph --setgroup ceph > ceph 10120 2.5 24.0 96130340 94856244 ? Ssl Sep02 644:50 > /usr/bin/ceph-osd -f --cluster apollo --id 7 --setuser ceph --setgroup ceph > ceph 36204 3.6 35.0 159394476 138592124 ? Ssl Jul01 4185:36 > /usr/bin/ceph-osd -f --cluster apollo --id 18 --setuser ceph --setgroup ceph > ceph 36427 3.7 34.4 155699060 136076432 ? Ssl Jul01 4298:26 > /usr/bin/ceph-osd -f --cluster apollo --id 36 --setuser ceph --setgroup ceph > ceph 36622 4.1 35.1 158219408 138724688 ? Ssl Jul01 4779:14 > /usr/bin/ceph-osd -f --cluster apollo --id 19 --setuser ceph --setgroup ceph > ceph 36881 4.0 35.1 157748752 138719064 ? Ssl Jul01 4669:54 > /usr/bin/ceph-osd -f --cluster apollo --id 37 --setuser ceph --setgroup ceph > ceph 34649 3.7 35.1 159601580 138652012 ? Ssl Jul01 4337:20 > /usr/bin/ceph-osd -f --cluster apollo --id 14 --setuser ceph --setgroup ceph > ceph 34881 3.8 35.1 158632412 138764376 ? Ssl Jul01 4433:50 > /usr/bin/ceph-osd -f --cluster apollo --id 33 --setuser ceph --setgroup ceph > ceph 34646 4.2 35.1 155029328 138732376 ? Ssl Jul01 4831:24 > /usr/bin/ceph-osd -f --cluster apollo --id 17 --setuser ceph --setgroup ceph > ceph 34881 4.1 35.1 156801676 138763588 ? Ssl Jul01 4710:19 > /usr/bin/ceph-osd -f --cluster apollo --id 39 --setuser ceph --setgroup ceph > ceph 36766 3.7 35.1 158070740 138703240 ? Ssl Jul01 4341:42 > /usr/bin/ceph-osd -f --cluster apollo --id 13 --setuser ceph --setgroup ceph > ceph 37013 3.5 35.0 157767668 138272248 ? Ssl Jul01 4094:12 > /usr/bin/ceph-osd -f --cluster apollo --id 34 --setuser ceph --setgroup ceph > ceph 35007 3.4 35.1 160318780 138756404 ? Ssl Jul01 3963:21 > /usr/bin/ceph-osd -f --cluster apollo --id 2 --setuser ceph --setgroup ceph > ceph 35217 3.5 35.1 159023744 138626680 ? Ssl Jul01 4041:50 > /usr/bin/ceph-osd -f --cluster apollo --id 22 --setuser