Re: [ceph-users] MDS behind on trimming
Quoting Stefan Kooman (ste...@bit.nl): > Quoting Dan van der Ster (d...@vanderster.com): > > Hi, > > > > We've used double the defaults for around 6 months now and haven't had any > > behind on trimming errors in that time. > > > >mds log max segments = 60 > >mds log max expiring = 40 > > > > Should be simple to try. > Yup, and works like a charm: > > ceph tell mds.* injectargs '--mds_log_max_segments=60' > ceph tell mds.* injectargs '--mds_log_max_expiring=40' ^^ I have bumped these again to "--mds_log_max_segments=120" and "--mds_log_max_expiring=80" cause while doing 2K objects/sec (client IO 120 MB/s ~ 12000 IOPS) the MDS was behind on trimming again. FYI, Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS behind on trimming
Quoting Dan van der Ster (d...@vanderster.com): > Hi, > > We've used double the defaults for around 6 months now and haven't had any > behind on trimming errors in that time. > >mds log max segments = 60 >mds log max expiring = 40 > > Should be simple to try. Yup, and works like a charm: ceph tell mds.* injectargs '--mds_log_max_segments=60' ceph tell mds.* injectargs '--mds_log_max_expiring=40' Although you see this logged: (not observed, change may require restart), these settings do get applied almost instantly ... and the trim lag was gone within 30 seconds after that. Thanks, Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS behind on trimming
On Thu, Dec 21, 2017 at 9:32 PM, Stefan Koomanwrote: > Hi, > > We have two MDS servers. One active, one active-standby. While doing a > parallel rsync of 10 threads with loads of files, dirs, subdirs we get > the following HEALTH_WARN: > > ceph health detail > HEALTH_WARN 2 MDSs behind on trimming > MDS_TRIM 2 MDSs behind on trimming > mdsmds2(mds.0): Behind on trimming (124/30)max_segments: 30, > num_segments: 124 > mdsmds1(mds.0): Behind on trimming (118/30)max_segments: 30, > num_segments: 118 > > To be clear: the amount of segments behind on trimming fluctuates. It > sometimes does get smaller, and is relatively stable around ~ 130. > > The load on the MDS is low, load on OSDs is low (both CPU/RAM/IO). All > flash, cephfs_metadata co-located on the same OSDs. Using cephfs kernel > client (4.13.0-19-generic) with Ceph 12.2.2 (cllient as well as cluster > runs Ceph 12.2.2). In older threads I found several possible > explanations for getting this warning: > > 1) When the number of segments exceeds that setting, the MDS starts > writing back metadata so that it can remove (trim) the oldest > segments. If this process is too slow, or a software bug is preventing > trimming, then this health message appears. > > 2) The OSDs cannot keep up with the load > > 3) cephfs kernel client mis behaving / bug > > I definitely don't think nr 2) is the reason. I doubt it's a Ceph MDS 1) > or client bug 3). Might this be conservative default settings? I.e. not > trying to trim fast / soon enough. John wonders in thread [1] if the > default journal length should be longer. Yan [2] recommends bumping > "mds_log_max_expiring" to a large value (200). > > What would you suggest at this point? I'm thinking about the following > changes: > > mds log max segments = 200 > mds log max expiring = 200 > Yes, these change should help. you can also try https://github.com/ceph/ceph/pull/18783 > Thanks, > > Stefan > > [1]: https://www.spinics.net/lists/ceph-users/msg39387.html > [2]: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011138.html > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS behind on trimming
Hi, We've used double the defaults for around 6 months now and haven't had any behind on trimming errors in that time. mds log max segments = 60 mds log max expiring = 40 Should be simple to try. -- dan On Thu, Dec 21, 2017 at 2:32 PM, Stefan Koomanwrote: > Hi, > > We have two MDS servers. One active, one active-standby. While doing a > parallel rsync of 10 threads with loads of files, dirs, subdirs we get > the following HEALTH_WARN: > > ceph health detail > HEALTH_WARN 2 MDSs behind on trimming > MDS_TRIM 2 MDSs behind on trimming > mdsmds2(mds.0): Behind on trimming (124/30)max_segments: 30, > num_segments: 124 > mdsmds1(mds.0): Behind on trimming (118/30)max_segments: 30, > num_segments: 118 > > To be clear: the amount of segments behind on trimming fluctuates. It > sometimes does get smaller, and is relatively stable around ~ 130. > > The load on the MDS is low, load on OSDs is low (both CPU/RAM/IO). All > flash, cephfs_metadata co-located on the same OSDs. Using cephfs kernel > client (4.13.0-19-generic) with Ceph 12.2.2 (cllient as well as cluster > runs Ceph 12.2.2). In older threads I found several possible > explanations for getting this warning: > > 1) When the number of segments exceeds that setting, the MDS starts > writing back metadata so that it can remove (trim) the oldest > segments. If this process is too slow, or a software bug is preventing > trimming, then this health message appears. > > 2) The OSDs cannot keep up with the load > > 3) cephfs kernel client mis behaving / bug > > I definitely don't think nr 2) is the reason. I doubt it's a Ceph MDS 1) > or client bug 3). Might this be conservative default settings? I.e. not > trying to trim fast / soon enough. John wonders in thread [1] if the > default journal length should be longer. Yan [2] recommends bumping > "mds_log_max_expiring" to a large value (200). > > What would you suggest at this point? I'm thinking about the following > changes: > > mds log max segments = 200 > mds log max expiring = 200 > > Thanks, > > Stefan > > [1]: https://www.spinics.net/lists/ceph-users/msg39387.html > [2]: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011138.html > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com