Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-23 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 5:31 PM, Casey Bodley wrote: > > On 06/22/2017 10:40 AM, Dan van der Ster wrote: >> >> On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: >>> >>> On 06/22/2017 04:00 AM, Dan van der Ster wrote: I'm now running the three

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Casey Bodley
On 06/22/2017 10:40 AM, Dan van der Ster wrote: On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: On 06/22/2017 04:00 AM, Dan van der Ster wrote: I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: > > On 06/22/2017 04:00 AM, Dan van der Ster wrote: >> >> I'm now running the three relevant OSDs with that patch. (Recompiled, >> replaced /usr/lib64/rados-classes/libcls_log.so with the new version, >> then restarted the

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Casey Bodley
On 06/22/2017 04:00 AM, Dan van der Ster wrote: I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with the new version, then restarted the osds). It's working quite well, trimming 10 entries at a time instead of 1000, and no

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney wrote: > On 06/14/17 11:59, Dan van der Ster wrote: >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:07:55.373184 osd.155

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Dan van der Ster
I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with the new version, then restarted the osds). It's working quite well, trimming 10 entries at a time instead of 1000, and no more timeouts. Do you think it would be worth

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Peter Maloney
On 06/14/17 11:59, Dan van der Ster wrote: > Dear ceph users, > > Today we had O(100) slow requests which were caused by deep-scrubbing > of the metadata log: > > 2017-06-14 11:07:55.373184 osd.155 > [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d > deep-scrub starts > ... >

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Casey Bodley
That patch looks reasonable. You could also try raising the values of osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on that osd in order to trim more at a time. On 06/21/2017 09:27 AM, Dan van der Ster wrote: Hi Casey, I managed to trim up all shards except for that

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Dan van der Ster
Hi Casey, I managed to trim up all shards except for that big #54. The others all trimmed within a few seconds. But 54 is proving difficult. It's still going after several days, and now I see that the 1000-key trim is indeed causing osd timeouts. I've manually compacted the relevant osd

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Casey Bodley
Hi Dan, That's good news that it can remove 1000 keys at a time without hitting timeouts. The output of 'du' will depend on when the leveldb compaction runs. If you do find that compaction leads to suicide timeouts on this osd (you would see a lot of 'leveldb:' output in the log), consider

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Dan van der Ster
On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote: > > On 06/14/2017 05:59 AM, Dan van der Ster wrote: >> >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:07:55.373184 osd.155 >>

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-15 Thread Casey Bodley
On 06/14/2017 05:59 AM, Dan van der Ster wrote: Dear ceph users, Today we had O(100) slow requests which were caused by deep-scrubbing of the metadata log: 2017-06-14 11:07:55.373184 osd.155 [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d deep-scrub starts ... 2017-06-14

[ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-14 Thread Dan van der Ster
Dear ceph users, Today we had O(100) slow requests which were caused by deep-scrubbing of the metadata log: 2017-06-14 11:07:55.373184 osd.155 [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d deep-scrub starts ... 2017-06-14 11:22:04.143903 osd.155