Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-05 Thread xiaoxi chen


> From: uker...@gmail.com
> Date: Tue, 5 Jul 2016 21:14:12 +0800
> To: kenneth.waege...@ugent.be
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mds0: Behind on trimming (58621/30)
> 
> On Tue, Jul 5, 2016 at 7:56 PM, Kenneth Waegeman
> <kenneth.waege...@ugent.be> wrote:
> >
> >
> > On 04/07/16 11:22, Kenneth Waegeman wrote:
> >>
> >>
> >>
> >> On 01/07/16 16:01, Yan, Zheng wrote:
> >>>
> >>> On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jsp...@redhat.com> wrote:
> >>>>
> >>>> On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
> >>>> <kenneth.waege...@ugent.be> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> While syncing a lot of files to cephfs, our mds cluster got haywire:
> >>>>> the
> >>>>> mdss have a lot of segments behind on trimming:  (58621/30)
> >>>>> Because of this the mds cluster gets degraded. RAM usage is about 50GB.
> >>>>> The
> >>>>> mdses were respawning and replaying continiously, and I had to stop all
> >>>>> syncs , unmount all clients and increase the beacon_grace to keep the
> >>>>> cluster up .
> >>>>>
> >>>>> [root@mds03 ~]# ceph status
> >>>>>  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
> >>>>>   health HEALTH_WARN
> >>>>>  mds0: Behind on trimming (58621/30)
> >>>>>   monmap e1: 3 mons at
> >>>>>
> >>>>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
> >>>>>  election epoch 170, quorum 0,1,2 mds01,mds02,mds03
> >>>>>fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
> >>>>>   osdmap e19966: 156 osds: 156 up, 156 in
> >>>>>  flags sortbitwise
> >>>>>pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
> >>>>>  357 TB used, 516 TB / 874 TB avail
> >>>>>  4151 active+clean
> >>>>> 5 active+clean+scrubbing
> >>>>> 4 active+clean+scrubbing+deep
> >>>>>client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
> >>>>>cache io 68 op/s promote
> >>>>>
> >>>>>
> >>>>> Now it finally is up again, it is trimming very slowly (+-120 segments
> >>>>> /
> >>>>> min)
> >>>>
> >>>> Hmm, so it sounds like something was wrong that got cleared by either
> >>>> the MDS restart or the client unmount, and now it's trimming at a
> >>>> healthier rate.
> >>>>
> >>>> What client (kernel or fuse, and version)?
> >>>>
> >>>> Can you confirm that the RADOS cluster itself was handling operations
> >>>> reasonably quickly?  Is your metadata pool using the same drives as
> >>>> your data?  Were the OSDs saturated with IO?
> >>>>
> >>>> While the cluster was accumulating untrimmed segments, did you also
> >>>> have a "client xyz failing to advanced oldest_tid" warning?
> >>>
> >>> This does not prevent MDS from trimming log segment.
> >>>
> >>>> It would be good to clarify whether the MDS was trimming slowly, or
> >>>> not at all.  If you can reproduce this situation, get it to a "behind
> >>>> on trimming" state, and the stop the client IO (but leave it mounted).
> >>>> See if the (x/30) number stays the same.  Then, does it start to
> >>>> decrease when you unmount the client?  That would indicate a
> >>>> misbehaving client.
> >>>
> >>> Behind on trimming on single MDS cluster should be caused by either
> >>> slow rados operations or MDS trim too few log segments on each tick.
> >>>
> >>> Kenneth, could you try setting mds_log_max_expiring to a large value
> >>> (such as 200)
> >>
> >> I've set the mds_log_max_expiring to 200 right now. Should I see something
> >> instantly?
> >
> > The trimming finished rather quick, although I don't have any accurate time
> > measures. Cluster looks running fine right now, but running incremental
> > sync. We will try with same data again t

Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-05 Thread Yan, Zheng
On Tue, Jul 5, 2016 at 7:56 PM, Kenneth Waegeman
 wrote:
>
>
> On 04/07/16 11:22, Kenneth Waegeman wrote:
>>
>>
>>
>> On 01/07/16 16:01, Yan, Zheng wrote:
>>>
>>> On Fri, Jul 1, 2016 at 6:59 PM, John Spray  wrote:

 On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
  wrote:
>
> Hi all,
>
> While syncing a lot of files to cephfs, our mds cluster got haywire:
> the
> mdss have a lot of segments behind on trimming:  (58621/30)
> Because of this the mds cluster gets degraded. RAM usage is about 50GB.
> The
> mdses were respawning and replaying continiously, and I had to stop all
> syncs , unmount all clients and increase the beacon_grace to keep the
> cluster up .
>
> [root@mds03 ~]# ceph status
>  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>   health HEALTH_WARN
>  mds0: Behind on trimming (58621/30)
>   monmap e1: 3 mons at
>
> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>  election epoch 170, quorum 0,1,2 mds01,mds02,mds03
>fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
>   osdmap e19966: 156 osds: 156 up, 156 in
>  flags sortbitwise
>pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
>  357 TB used, 516 TB / 874 TB avail
>  4151 active+clean
> 5 active+clean+scrubbing
> 4 active+clean+scrubbing+deep
>client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
>cache io 68 op/s promote
>
>
> Now it finally is up again, it is trimming very slowly (+-120 segments
> /
> min)

 Hmm, so it sounds like something was wrong that got cleared by either
 the MDS restart or the client unmount, and now it's trimming at a
 healthier rate.

 What client (kernel or fuse, and version)?

 Can you confirm that the RADOS cluster itself was handling operations
 reasonably quickly?  Is your metadata pool using the same drives as
 your data?  Were the OSDs saturated with IO?

 While the cluster was accumulating untrimmed segments, did you also
 have a "client xyz failing to advanced oldest_tid" warning?
>>>
>>> This does not prevent MDS from trimming log segment.
>>>
 It would be good to clarify whether the MDS was trimming slowly, or
 not at all.  If you can reproduce this situation, get it to a "behind
 on trimming" state, and the stop the client IO (but leave it mounted).
 See if the (x/30) number stays the same.  Then, does it start to
 decrease when you unmount the client?  That would indicate a
 misbehaving client.
>>>
>>> Behind on trimming on single MDS cluster should be caused by either
>>> slow rados operations or MDS trim too few log segments on each tick.
>>>
>>> Kenneth, could you try setting mds_log_max_expiring to a large value
>>> (such as 200)
>>
>> I've set the mds_log_max_expiring to 200 right now. Should I see something
>> instantly?
>
> The trimming finished rather quick, although I don't have any accurate time
> measures. Cluster looks running fine right now, but running incremental
> sync. We will try with same data again to see if it is ok now.
> Is this mds_log_max_expiring option production ready ? (Don't seem to find
> it in documentation)

It should be safe. Setting mds_log_max_expiring to 200 does not change
the code path

Yan, Zheng

>
> Thank you!!
>
> K
>
>>
>> This weekend , the trimming did not contunue and something happened to the
>> cluster:
>>
>> mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
>> log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 object,
>> errno -2
>> mds.0.78429 unhandled write error (2) No such file or directory, force
>> readonly...
>> mds.0.cache force file system read-only
>> log_channel(cluster) log [WRN] : force file system read-only
>>
>> and ceph health reported:
>> mds0: MDS in read-only mode
>>
>> I restarted it and it is trimming again.
>>
>>
>> Thanks again!
>> Kenneth
>>>
>>> Regards
>>> Yan, Zheng
>>>
 John
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-05 Thread Kenneth Waegeman



On 04/07/16 11:22, Kenneth Waegeman wrote:



On 01/07/16 16:01, Yan, Zheng wrote:

On Fri, Jul 1, 2016 at 6:59 PM, John Spray  wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
 wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got 
haywire: the

mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 
50GB. The
mdses were respawning and replaying continiously, and I had to stop 
all

syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0} 


 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 
segments /

min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

This does not prevent MDS from trimming log segment.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.

Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.

Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)
I've set the mds_log_max_expiring to 200 right now. Should I see 
something instantly?
The trimming finished rather quick, although I don't have any accurate 
time measures. Cluster looks running fine right now, but running 
incremental sync. We will try with same data again to see if it is ok now.
Is this mds_log_max_expiring option production ready ? (Don't seem to 
find it in documentation)


Thank you!!

K


This weekend , the trimming did not contunue and something happened to 
the cluster:


mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 
object, errno -2
mds.0.78429 unhandled write error (2) No such file or directory, force 
readonly...

mds.0.cache force file system read-only
log_channel(cluster) log [WRN] : force file system read-only

and ceph health reported:
mds0: MDS in read-only mode

I restarted it and it is trimming again.


Thanks again!
Kenneth

Regards
Yan, Zheng


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-04 Thread Kenneth Waegeman



On 01/07/16 16:01, Yan, Zheng wrote:

On Fri, Jul 1, 2016 at 6:59 PM, John Spray  wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
 wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
mdses were respawning and replaying continiously, and I had to stop all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments /
min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

This does not prevent MDS from trimming log segment.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.

Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.

Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)
I've set the mds_log_max_expiring to 200 right now. Should I see 
something instantly?


This weekend , the trimming did not contunue and something happened to 
the cluster:


mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 
object, errno -2
mds.0.78429 unhandled write error (2) No such file or directory, force 
readonly...

mds.0.cache force file system read-only
log_channel(cluster) log [WRN] : force file system read-only

and ceph health reported:
mds0: MDS in read-only mode

I restarted it and it is trimming again.


Thanks again!
Kenneth

Regards
Yan, Zheng


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread Kenneth Waegeman



On 01/07/16 12:59, John Spray wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
 wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
mdses were respawning and replaying continiously, and I had to stop all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments /
min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

kernel client of centos 7.2, 3.10.0-327.18.2.el7


Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?
Metadata pool is a pool of SSDS. Data is ecpool with a cache layer of 
seperate ssds. There was indeed load on the OSDS, and the ceph health 
command produced regularly Cache at/near full ratio warnings too




While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

We did not see that warning.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.
mds trimming still at (37927/30), so have to wait some more hours before 
i can try to reproduce it. (Nothing can be done to speed this up?)
There was a moment were the mds was active and I didn't saw the segments 
going down.. I did ran ceph daemon mds.mds03 flush journal. But this was 
before i changed the beacon_grace so it respawned again at that moment, 
so I'm not quite sure if there was another issue then.


Thanks again!

Kenneth


John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread Yan, Zheng
On Fri, Jul 1, 2016 at 6:59 PM, John Spray  wrote:
> On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
>  wrote:
>> Hi all,
>>
>> While syncing a lot of files to cephfs, our mds cluster got haywire: the
>> mdss have a lot of segments behind on trimming:  (58621/30)
>> Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
>> mdses were respawning and replaying continiously, and I had to stop all
>> syncs , unmount all clients and increase the beacon_grace to keep the
>> cluster up .
>>
>> [root@mds03 ~]# ceph status
>> cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>  health HEALTH_WARN
>> mds0: Behind on trimming (58621/30)
>>  monmap e1: 3 mons at
>> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
>> election epoch 170, quorum 0,1,2 mds01,mds02,mds03
>>   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
>>  osdmap e19966: 156 osds: 156 up, 156 in
>> flags sortbitwise
>>   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
>> 357 TB used, 516 TB / 874 TB avail
>> 4151 active+clean
>>5 active+clean+scrubbing
>>4 active+clean+scrubbing+deep
>>   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
>>   cache io 68 op/s promote
>>
>>
>> Now it finally is up again, it is trimming very slowly (+-120 segments /
>> min)
>
> Hmm, so it sounds like something was wrong that got cleared by either
> the MDS restart or the client unmount, and now it's trimming at a
> healthier rate.
>
> What client (kernel or fuse, and version)?
>
> Can you confirm that the RADOS cluster itself was handling operations
> reasonably quickly?  Is your metadata pool using the same drives as
> your data?  Were the OSDs saturated with IO?
>
> While the cluster was accumulating untrimmed segments, did you also
> have a "client xyz failing to advanced oldest_tid" warning?

This does not prevent MDS from trimming log segment.

>
> It would be good to clarify whether the MDS was trimming slowly, or
> not at all.  If you can reproduce this situation, get it to a "behind
> on trimming" state, and the stop the client IO (but leave it mounted).
> See if the (x/30) number stays the same.  Then, does it start to
> decrease when you unmount the client?  That would indicate a
> misbehaving client.

Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.

Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)

Regards
Yan, Zheng

>
> John
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread John Spray
On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
 wrote:
> Hi all,
>
> While syncing a lot of files to cephfs, our mds cluster got haywire: the
> mdss have a lot of segments behind on trimming:  (58621/30)
> Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
> mdses were respawning and replaying continiously, and I had to stop all
> syncs , unmount all clients and increase the beacon_grace to keep the
> cluster up .
>
> [root@mds03 ~]# ceph status
> cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>  health HEALTH_WARN
> mds0: Behind on trimming (58621/30)
>  monmap e1: 3 mons at
> {mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
> election epoch 170, quorum 0,1,2 mds01,mds02,mds03
>   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
>  osdmap e19966: 156 osds: 156 up, 156 in
> flags sortbitwise
>   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
> 357 TB used, 516 TB / 874 TB avail
> 4151 active+clean
>5 active+clean+scrubbing
>4 active+clean+scrubbing+deep
>   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
>   cache io 68 op/s promote
>
>
> Now it finally is up again, it is trimming very slowly (+-120 segments /
> min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread Kenneth Waegeman

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the 
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. 
The mdses were respawning and replaying continiously, and I had to stop 
all syncs , unmount all clients and increase the beacon_grace to keep 
the cluster up .


[root@mds03 ~]# ceph status
cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
 health HEALTH_WARN
mds0: Behind on trimming (58621/30)
 monmap e1: 3 mons at 
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}

election epoch 170, quorum 0,1,2 mds01,mds02,mds03
  fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
 osdmap e19966: 156 osds: 156 up, 156 in
flags sortbitwise
  pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
357 TB used, 516 TB / 874 TB avail
4151 active+clean
   5 active+clean+scrubbing
   4 active+clean+scrubbing+deep
  client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
  cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments / 
min)

We've seen some 'behind on trimming' before, but never that much..
So now our production cluster is unusable for approx half a day..

What could be the problem here? We are running 10.2.1
Can something be done to not let the mds keep that much segments ?
Can we fasten the trimming process?

Thanks you very much!

Cheers,
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com