Re: [ceph-users] High apply latency

2018-02-06 Thread Frédéric Nass
Hi Jakub, Le 06/02/2018 à 16:03, Jakub Jaszewski a écrit : ​Hi Frederic, I've not enable debug level logging on all OSDs, just on one for the test, need to double check that. But looks that merging is ongoing on few OSDs or OSDs are faulty, I will dig into that tomorrow. Write bandwidth is

Re: [ceph-users] High apply latency

2018-02-06 Thread Jakub Jaszewski
​Hi Frederic, I've not enable debug level logging on all OSDs, just on one for the test, need to double check that. But looks that merging is ongoing on few OSDs or OSDs are faulty, I will dig into that tomorrow. Write bandwidth is very random # rados bench -p default.rgw.buckets.data 120 write

Re: [ceph-users] High apply latency

2018-02-05 Thread Frédéric Nass
Hi Jakub, Le 05/02/2018 à 12:26, Jakub Jaszewski a écrit : Hi Frederic, Many thanks for your contribution to the topic! I've just set logging level 20 for filestore via ceph tell osd.0 config set debug_filestore 20 but so far ​found  nothing by keyword 'split' ​ in

Re: [ceph-users] High apply latency

2018-02-02 Thread Piotr Dałek
On 18-02-02 09:55 AM, Jakub Jaszewski wrote: Hi, So I have changed merge & split settings to filestore_merge_threshold = 40 filestore_split_multiple = 8 and restart all OSDs , host by host. Let me ask a question, although the pool default.rgw.buckets.data that was affected prior to the above

Re: [ceph-users] High apply latency

2018-02-02 Thread Jakub Jaszewski
Hi, So I have changed merge & split settings to filestore_merge_threshold = 40 filestore_split_multiple = 8 and restart all OSDs , host by host. Let me ask a question, although the pool default.rgw.buckets.data that was affected prior to the above change has higher write bandwidth it is very

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
Regarding split & merge, I have default values filestore_merge_threshold = 10 filestore_split_multiple = 2 according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the recommended values are filestore_merge_threshold = 40 filestore_split_multiple = 8 Is it something that I can easily

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
Thanks, I've temporary disabled both scrubbing and deep-scrubbing, things are getting better I feel. I just noticed high traffic generated on pool default.rgw.gc pool default.rgw.gc id 7 client io 2162 MB/s rd, 0 B/s wr, 3023 op/s rd, 0 op/s wr There is lot of data written via radosgw

Re: [ceph-users] High apply latency

2018-01-31 Thread Sergey Malinin
Deep scrub is I/O-expensive. If deep scrub is unnecessary, you can disable it with "ceph osd pool set nodeep-scrub". On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote: > 3active+clean+scrubbing+deep ___ ceph-users

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
Hi Luis, Thanks for your comment, I see high %util for few HDDs per each ceph node but actually there is very low traffic from client. iostat -xd shows ongoing operations Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

Re: [ceph-users] High apply latency

2018-01-31 Thread Luis Periquito
on a cursory look of the information it seems the cluster is overloaded with the requests. Just a guess, but if you look at IO usage on those spindles they'll be at or around 100% usage most of the time. If that is the case then increasing the pg_num and pgp_num won't help, and short term, will

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes and default.rgw.buckets.data pools? How will it impact cluster behavior? I guess cluster rebalancing will occur and will take long time considering amount of data we have on it ? Regards Jakub On Wed, Jan 31, 2018 at

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
​​ Hi, I'm wondering why slow requests are being reported mainly when the request has been put into the queue for processing by its PG (queued_for_pg , http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request ). Could it be due too low pg_num/pgp_num ?

[ceph-users] High apply latency

2018-01-30 Thread Jakub Jaszewski
Hi, We observe high apply_latency(ms) and poor write performance I believe. In logs there are repetitive slow request warnings related different OSDs and servers. ceph versions 12.2.2 Cluster HW description: 9x Dell PowerEdge R730xd 1x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (10C/20T) 256 GB