On Thu, Jun 22, 2017 at 5:31 PM, Casey Bodley wrote:
>
> On 06/22/2017 10:40 AM, Dan van der Ster wrote:
>>
>> On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote:
>>>
>>> On 06/22/2017 04:00 AM, Dan van der Ster wrote:
I'm now running the three
On 06/22/2017 10:40 AM, Dan van der Ster wrote:
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote:
On 06/22/2017 04:00 AM, Dan van der Ster wrote:
I'm now running the three relevant OSDs with that patch. (Recompiled,
replaced /usr/lib64/rados-classes/libcls_log.so with
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote:
>
> On 06/22/2017 04:00 AM, Dan van der Ster wrote:
>>
>> I'm now running the three relevant OSDs with that patch. (Recompiled,
>> replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
>> then restarted the
On 06/22/2017 04:00 AM, Dan van der Ster wrote:
I'm now running the three relevant OSDs with that patch. (Recompiled,
replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
then restarted the osds).
It's working quite well, trimming 10 entries at a time instead of
1000, and no
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney
wrote:
> On 06/14/17 11:59, Dan van der Ster wrote:
>> Dear ceph users,
>>
>> Today we had O(100) slow requests which were caused by deep-scrubbing
>> of the metadata log:
>>
>> 2017-06-14 11:07:55.373184 osd.155
I'm now running the three relevant OSDs with that patch. (Recompiled,
replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
then restarted the osds).
It's working quite well, trimming 10 entries at a time instead of
1000, and no more timeouts.
Do you think it would be worth
On 06/14/17 11:59, Dan van der Ster wrote:
> Dear ceph users,
>
> Today we had O(100) slow requests which were caused by deep-scrubbing
> of the metadata log:
>
> 2017-06-14 11:07:55.373184 osd.155
> [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
> deep-scrub starts
> ...
>
That patch looks reasonable. You could also try raising the values of
osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on
that osd in order to trim more at a time.
On 06/21/2017 09:27 AM, Dan van der Ster wrote:
Hi Casey,
I managed to trim up all shards except for that
Hi Casey,
I managed to trim up all shards except for that big #54. The others
all trimmed within a few seconds.
But 54 is proving difficult. It's still going after several days, and
now I see that the 1000-key trim is indeed causing osd timeouts. I've
manually compacted the relevant osd
Hi Dan,
That's good news that it can remove 1000 keys at a time without hitting
timeouts. The output of 'du' will depend on when the leveldb compaction
runs. If you do find that compaction leads to suicide timeouts on this
osd (you would see a lot of 'leveldb:' output in the log), consider
On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote:
>
> On 06/14/2017 05:59 AM, Dan van der Ster wrote:
>>
>> Dear ceph users,
>>
>> Today we had O(100) slow requests which were caused by deep-scrubbing
>> of the metadata log:
>>
>> 2017-06-14 11:07:55.373184 osd.155
>>
On 06/14/2017 05:59 AM, Dan van der Ster wrote:
Dear ceph users,
Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:
2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14
Dear ceph users,
Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:
2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
13 matches
Mail list logo